From patchwork Mon Jun 14 13:04:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Phillip Wood X-Patchwork-Id: 12318913 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58E39C48BE6 for ; Mon, 14 Jun 2021 13:04:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 448A961283 for ; Mon, 14 Jun 2021 13:04:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233511AbhFNNG5 (ORCPT ); Mon, 14 Jun 2021 09:06:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37468 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233218AbhFNNGz (ORCPT ); Mon, 14 Jun 2021 09:06:55 -0400 Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 81515C061574 for ; Mon, 14 Jun 2021 06:04:52 -0700 (PDT) Received: by mail-wr1-x432.google.com with SMTP id z8so14481308wrp.12 for ; Mon, 14 Jun 2021 06:04:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=ixDUzG4BSfq5V4lbHgz7dBkYRzOyz9uAnEk+R2ZPz2c=; b=X90mpa/liDPtQ/471NB/1iwXYgDUaFhpew+fsvRJ6/3cq4mL2wZWka36tjyJvh+fVx YiVnXBRPzD+SqDYMZjN5q3YA/57DpcjnrIIrSToctgM9SuQDpREGVjESB3ww4kZ8JUpW nPMIAXAzdbD1P6fSZhCBA9W7LEjLYHofwPR9z7V+TrAOadeM7lnRYLv6IMywJkp+VCpR a2KY3H4q7FSbsnU25BZQkGXDNEwU1hJDQzVYOKjRVbglnoK6CZ6Kq/4MTbwQ3kng3tjT bdJbqgYOS1HotVOPdF40bIcezPguT4Ja2XqoGgYftEYAa7hgOHiQWUQW/axkcaOzduQo Y22Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=ixDUzG4BSfq5V4lbHgz7dBkYRzOyz9uAnEk+R2ZPz2c=; b=DDbFBl900bHIgjceq0D62/wYa6iHAT4cmQrHA9XLGyfVwVAeOluDns9JldUD1k65XP Dyf3C+vc2StIr/4IBpUCqhWe3pQAtJzMXf0o90Nw7WqvjgaeBK4huicWu1O/bSBYEUb9 UCkd23cctglolyyS5regP3I4Uq8RBsZG5lw4KhLRqcC5WnOEuBzjPb1W3dwDtUX0ON3y yWQ7C+NpQoLC+IQfY5qIVoSAXP9y1J2wtNgVCjQ+HrIZlWx8jUIOU+V3qz2t9hMqimNG 5ock6PNcPHpR/urK5+E5oJI65IOKqg1UQgT7BboM9NLIEu0qFmEQ/8dKke14V5A17sYA 2xPA== X-Gm-Message-State: AOAM531x0d00jgRhMCplZFSBdutV5UKkRbUSEbOUad5EymFF4EtEcVnR eRnnm7haK/oiR2ZPyIoe7XgpmbQnfRk= X-Google-Smtp-Source: ABdhPJzBXwSF0ihtec1U4zLz/qA96ccLCa6Kvb3tPkVImoimOgGVYUVGBbqgDZUSyhKgJAoAxUDmsQ== X-Received: by 2002:a5d:6147:: with SMTP id y7mr18654446wrt.418.1623675891195; Mon, 14 Jun 2021 06:04:51 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id u20sm12868605wmq.24.2021.06.14.06.04.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Jun 2021 06:04:50 -0700 (PDT) Message-Id: <374dbebcbf29b686508e51205b2f7a4e72104950.1623675888.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 14 Jun 2021 13:04:39 +0000 Subject: [PATCH 01/10] diff --color-moved=zerba: fix alternate coloring Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Phillip Wood , Phillip Wood Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Phillip Wood From: Phillip Wood b0a2ba4776 ("diff --color-moved=zebra: be stricter with color alternation", 2018-11-23) sought to avoid using the alternate colors unless there are two adjacent moved blocks of the same sign. Unfortunately it contains two bugs that prevented it from fixing the problem properly. Firstly `last_symbol` is reset at the start of each iteration of the loop losing the symbol of the last line and secondly when deciding whether to use the alternate color it should be checking if the current line is the same sign of the last line, not a different sign. The combination of the two errors means that we still use the alternate color when we should do but we also use it when we shouldn't. This is most noticable when using --color-moved-ws=allow-indentation-change with hunks like -this line gets indented + this line gets indented where the post image is colored with newMovedAlternate rather than newMoved. While this does not matter much, the next commit will change the coloring to be correct in this case, so lets fix the bug here to make it clear why the output is changing and add a regression test. Signed-off-by: Phillip Wood --- diff.c | 4 +-- t/t4015-diff-whitespace.sh | 72 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 74 insertions(+), 2 deletions(-) diff --git a/diff.c b/diff.c index 52c791574b71..cb068f8258c0 100644 --- a/diff.c +++ b/diff.c @@ -1142,6 +1142,7 @@ static void mark_color_as_moved(struct diff_options *o, struct moved_block *pmb = NULL; /* potentially moved blocks */ int pmb_nr = 0, pmb_alloc = 0; int n, flipped_block = 0, block_length = 0; + enum diff_symbol last_symbol = 0; for (n = 0; n < o->emitted_symbols->nr; n++) { @@ -1149,7 +1150,6 @@ static void mark_color_as_moved(struct diff_options *o, struct moved_entry *key; struct moved_entry *match = NULL; struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n]; - enum diff_symbol last_symbol = 0; switch (l->s) { case DIFF_SYMBOL_PLUS: @@ -1214,7 +1214,7 @@ static void mark_color_as_moved(struct diff_options *o, } if (adjust_last_block(o, n, block_length) && - pmb_nr && last_symbol != l->s) + pmb_nr && last_symbol == l->s) flipped_block = (flipped_block + 1) % 2; else flipped_block = 0; diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh index 2c13b62d3c65..920114cd795c 100755 --- a/t/t4015-diff-whitespace.sh +++ b/t/t4015-diff-whitespace.sh @@ -1442,6 +1442,78 @@ test_expect_success 'detect permutations inside moved code -- dimmed-zebra' ' test_cmp expected actual ' +test_expect_success 'zebra alternate color is only used when necessary' ' + cat >old.txt <<-\EOF && + line 1A should be marked as oldMoved newMovedAlternate + line 1B should be marked as oldMoved newMovedAlternate + unchanged + line 2A should be marked as oldMoved newMovedAlternate + line 2B should be marked as oldMoved newMovedAlternate + line 3A should be marked as oldMovedAlternate newMoved + line 3B should be marked as oldMovedAlternate newMoved + unchanged + line 4A should be marked as oldMoved newMovedAlternate + line 4B should be marked as oldMoved newMovedAlternate + line 5A should be marked as oldMovedAlternate newMoved + line 5B should be marked as oldMovedAlternate newMoved + line 6A should be marked as oldMoved newMoved + line 6B should be marked as oldMoved newMoved + EOF + cat >new.txt <<-\EOF && + line 1A should be marked as oldMoved newMovedAlternate + line 1B should be marked as oldMoved newMovedAlternate + unchanged + line 3A should be marked as oldMovedAlternate newMoved + line 3B should be marked as oldMovedAlternate newMoved + line 2A should be marked as oldMoved newMovedAlternate + line 2B should be marked as oldMoved newMovedAlternate + unchanged + line 6A should be marked as oldMoved newMoved + line 6B should be marked as oldMoved newMoved + line 4A should be marked as oldMoved newMovedAlternate + line 4B should be marked as oldMoved newMovedAlternate + line 5A should be marked as oldMovedAlternate newMoved + line 5B should be marked as oldMovedAlternate newMoved + EOF + test_expect_code 1 git diff --no-index --color --color-moved=zebra \ + --color-moved-ws=allow-indentation-change \ + old.txt new.txt >output && + grep -v index output | test_decode_color >actual && + cat >expected <<-\EOF && + diff --git a/old.txt b/new.txt + --- a/old.txt + +++ b/new.txt + @@ -1,14 +1,14 @@ + -line 1A should be marked as oldMoved newMovedAlternate + -line 1B should be marked as oldMoved newMovedAlternate + + line 1A should be marked as oldMoved newMovedAlternate + + line 1B should be marked as oldMoved newMovedAlternate + unchanged + -line 2A should be marked as oldMoved newMovedAlternate + -line 2B should be marked as oldMoved newMovedAlternate + -line 3A should be marked as oldMovedAlternate newMoved + -line 3B should be marked as oldMovedAlternate newMoved + + line 3A should be marked as oldMovedAlternate newMoved + + line 3B should be marked as oldMovedAlternate newMoved + + line 2A should be marked as oldMoved newMovedAlternate + + line 2B should be marked as oldMoved newMovedAlternate + unchanged + -line 4A should be marked as oldMoved newMovedAlternate + -line 4B should be marked as oldMoved newMovedAlternate + -line 5A should be marked as oldMovedAlternate newMoved + -line 5B should be marked as oldMovedAlternate newMoved + -line 6A should be marked as oldMoved newMoved + -line 6B should be marked as oldMoved newMoved + + line 6A should be marked as oldMoved newMoved + + line 6B should be marked as oldMoved newMoved + + line 4A should be marked as oldMoved newMovedAlternate + + line 4B should be marked as oldMoved newMovedAlternate + + line 5A should be marked as oldMovedAlternate newMoved + + line 5B should be marked as oldMovedAlternate newMoved + EOF + test_cmp expected actual +' + test_expect_success 'cmd option assumes configured colored-moved' ' test_config color.diff.oldMoved "magenta" && test_config color.diff.newMoved "cyan" && From patchwork Mon Jun 14 13:04:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Phillip Wood X-Patchwork-Id: 12318919 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9FBA2C2B9F4 for ; Mon, 14 Jun 2021 13:05:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3D03661244 for ; Mon, 14 Jun 2021 13:05:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233590AbhFNNHP (ORCPT ); Mon, 14 Jun 2021 09:07:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37472 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233098AbhFNNHF (ORCPT ); Mon, 14 Jun 2021 09:07:05 -0400 Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2796EC061766 for ; Mon, 14 Jun 2021 06:04:53 -0700 (PDT) Received: by mail-wr1-x432.google.com with SMTP id l2so14516561wrw.6 for ; Mon, 14 Jun 2021 06:04:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=oeHexxVeu4DZVMW4mceeMP2Rbpdm9rd5hWzuFjjwvwg=; b=TAraV2onBuPe1CvD+P8UBtOMmVhqMwdFz+Wk4ePYU69zi3Djg8NqXAhgFkQ3z+OHvD Avpo1ZHQ9evpN9poyTioQPD9n7WHx6FwkBXMCVBeyuCnwUizg9iPqPGlvWkXzKLaDMdr ZsFfJqwJFWpq6x6M3YyqxRAGLQPrZNT0QGfLH4jwqtnnCCSgI5em4QFjfzNSwPS1LwZb APcFLHr9mY59j08/dLF6UgJERwE0PHAEJz0o3CxqeyW7XgpR2HhQJhJvfrABVXsZBsdc QfUZF/wp3KfWUT6qJ3C/DfYrB7E5cNanaNo+Dt/cgV0G/32kngN2m5be3UfnJqQkI2OQ m6vA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=oeHexxVeu4DZVMW4mceeMP2Rbpdm9rd5hWzuFjjwvwg=; b=VpD1FnocufxJuffiK95IzdN1R+ZbCOHMisbeTyrkqDhKQm12LzkXim0nIZxuZM2c/C Vwev6p6Y20dbNc9VwYwSb3gZLYQJ+dskYp3CBq0YF6lnC2M2mSXPh/seHn78d/bClIew T7x0l/xYzWeAj0UIAoO2q4RuzInlB6R5OzW4gTDCLzzZ4f5iNDFOtEV5ad5NVewXdgUy oHS8SAoan2/v2xgG/WPhAvrYrb87YWvVY266BW2W35Xm0qqxKHbbTrSz0zjykf12Uca+ K1xV2YPmetW2grw9yTL/dncahRcVG+YEVvAPSNWK4BGbI3IxBU2tnX7OPcauS+Xzg/pT k/5g== X-Gm-Message-State: AOAM53162TQoceC354GbkA22fsqd7zp3jfIhMxXAlfkAY6ONzF7PD73s YxqyRrW/QDa681hv7e0k0oApPM5bBGE= X-Google-Smtp-Source: ABdhPJzdB4VO4hxHNQjNd/YzczdUQwV5p1h/1hcmc2b/p9UPLxPLFPm0vC7mRuRn9jbbMcyKbGg4qg== X-Received: by 2002:a5d:4744:: with SMTP id o4mr12567451wrs.354.1623675891761; Mon, 14 Jun 2021 06:04:51 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id o3sm19579761wrc.0.2021.06.14.06.04.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Jun 2021 06:04:51 -0700 (PDT) Message-Id: <3d02a0a91a086417f9dec4823255f50644e3aa87.1623675889.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 14 Jun 2021 13:04:40 +0000 Subject: [PATCH 02/10] diff --color-moved: avoid false short line matches and bad zerba coloring Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Phillip Wood , Phillip Wood Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Phillip Wood From: Phillip Wood When marking moved lines it is possible for a block of potential matched lines to extend past a change in sign when there is a sequence of added lines whose text matches the text of a sequence of deleted and added lines. Most of the time either `match` will be NULL or `pmb_advance_or_null()` will fail when the loop encounters a change of sign but there are corner cases where `match` is non-NULL and `pmb_advance_or_null()` successfully advances the moved block despite the change in sign. One consequence of this is highlighting a short line as moved when it should not be. For example -moved line # Correctly highlighted as moved +short line # Wrongly highlighted as moved context +moved line # Correctly highlighted as moved +short line context -short line The other consequence is coloring a moved addition following a moved deletion in the wrong color. In the example below the first "+moved line 3" should be highlighted as newMoved not newMovedAlternate. -moved line 1 # Correctly highlighted as oldMoved -moved line 2 # Correctly highlighted as oldMovedAlternate +moved line 3 # Wrongly highlighted as newMovedAlternate context # Everything else is highlighted correctly +moved line 2 +moved line 3 context +moved line 1 -moved line 3 These false matches are more likely when using --color-moved-ws with the exception of --color-moved-ws=allow-indentation-change which ties the sign of the current whitespace delta to the sign of the line to avoid this problem. The fix is to check that the sign of the new line being matched is the same as the sign of the line that started the block of potential matches. Signed-off-by: Phillip Wood --- diff.c | 17 ++++++---- t/t4015-diff-whitespace.sh | 65 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 76 insertions(+), 6 deletions(-) diff --git a/diff.c b/diff.c index cb068f8258c0..a0c43a104768 100644 --- a/diff.c +++ b/diff.c @@ -1142,7 +1142,7 @@ static void mark_color_as_moved(struct diff_options *o, struct moved_block *pmb = NULL; /* potentially moved blocks */ int pmb_nr = 0, pmb_alloc = 0; int n, flipped_block = 0, block_length = 0; - enum diff_symbol last_symbol = 0; + enum diff_symbol moved_symbol = 0; for (n = 0; n < o->emitted_symbols->nr; n++) { @@ -1168,7 +1168,7 @@ static void mark_color_as_moved(struct diff_options *o, flipped_block = 0; } - if (!match) { + if (pmb_nr && (!match || l->s != moved_symbol)) { int i; adjust_last_block(o, n, block_length); @@ -1177,12 +1177,13 @@ static void mark_color_as_moved(struct diff_options *o, pmb_nr = 0; block_length = 0; flipped_block = 0; - last_symbol = l->s; + } + if (!match) { + moved_symbol = 0; continue; } if (o->color_moved == COLOR_MOVED_PLAIN) { - last_symbol = l->s; l->flags |= DIFF_SYMBOL_MOVED_LINE; continue; } @@ -1214,11 +1215,16 @@ static void mark_color_as_moved(struct diff_options *o, } if (adjust_last_block(o, n, block_length) && - pmb_nr && last_symbol == l->s) + pmb_nr && moved_symbol == l->s) flipped_block = (flipped_block + 1) % 2; else flipped_block = 0; + if (pmb_nr) + moved_symbol = l->s; + else + moved_symbol = 0; + block_length = 0; } @@ -1228,7 +1234,6 @@ static void mark_color_as_moved(struct diff_options *o, if (flipped_block && o->color_moved != COLOR_MOVED_BLOCKS) l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT; } - last_symbol = l->s; } adjust_last_block(o, n, block_length); diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh index 920114cd795c..3119a59f071d 100755 --- a/t/t4015-diff-whitespace.sh +++ b/t/t4015-diff-whitespace.sh @@ -1514,6 +1514,71 @@ test_expect_success 'zebra alternate color is only used when necessary' ' test_cmp expected actual ' +test_expect_success 'short lines of opposite sign do not get marked as moved' ' + cat >old.txt <<-\EOF && + this line should be marked as moved + unchanged + unchanged + unchanged + unchanged + too short + this line should be marked as oldMoved newMoved + this line should be marked as oldMovedAlternate newMoved + unchanged 1 + unchanged 2 + unchanged 3 + unchanged 4 + this line should be marked as oldMoved newMoved/newMovedAlternate + EOF + cat >new.txt <<-\EOF && + too short + unchanged + unchanged + this line should be marked as moved + too short + unchanged + unchanged + this line should be marked as oldMoved newMoved/newMovedAlternate + unchanged 1 + unchanged 2 + this line should be marked as oldMovedAlternate newMoved + this line should be marked as oldMoved newMoved/newMovedAlternate + unchanged 3 + this line should be marked as oldMoved newMoved + unchanged 4 + EOF + test_expect_code 1 git diff --no-index --color --color-moved=zebra \ + old.txt new.txt >output && cat output && + grep -v index output | test_decode_color >actual && + cat >expect <<-\EOF && + diff --git a/old.txt b/new.txt + --- a/old.txt + +++ b/new.txt + @@ -1,13 +1,15 @@ + -this line should be marked as moved + +too short + unchanged + unchanged + +this line should be marked as moved + +too short + unchanged + unchanged + -too short + -this line should be marked as oldMoved newMoved + -this line should be marked as oldMovedAlternate newMoved + +this line should be marked as oldMoved newMoved/newMovedAlternate + unchanged 1 + unchanged 2 + +this line should be marked as oldMovedAlternate newMoved + +this line should be marked as oldMoved newMoved/newMovedAlternate + unchanged 3 + +this line should be marked as oldMoved newMoved + unchanged 4 + -this line should be marked as oldMoved newMoved/newMovedAlternate + EOF + test_cmp expect actual +' + test_expect_success 'cmd option assumes configured colored-moved' ' test_config color.diff.oldMoved "magenta" && test_config color.diff.newMoved "cyan" && From patchwork Mon Jun 14 13:04:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Phillip Wood X-Patchwork-Id: 12318921 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF323C2B9F4 for ; Mon, 14 Jun 2021 13:05:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C263C61244 for ; Mon, 14 Jun 2021 13:05:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233532AbhFNNH4 (ORCPT ); Mon, 14 Jun 2021 09:07:56 -0400 Received: from mail-wr1-f43.google.com ([209.85.221.43]:37501 "EHLO mail-wr1-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232761AbhFNNHz (ORCPT ); Mon, 14 Jun 2021 09:07:55 -0400 Received: by mail-wr1-f43.google.com with SMTP id i94so14449026wri.4 for ; Mon, 14 Jun 2021 06:05:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=PQa5L1KH6AkV1ZWZqO7SPqfKkza6UvqSXdHO3NSlWVM=; b=AlDgWZTNL7JEr2Sg7qLSNddX3PJXCDVv1grhEsaJDjitnIMT/7kKYXt7lR2fNGlaaq 34St4Ph0CNn4mr7SQsXaL3FrfMyqJFq1S3/nmE8eWYtRwLDZ/bIcihjqjAYrKY0+fyp/ h50pBV91MfjL5otMCnHhkGfLXv9kiZTdruhlpp3T0u368074YzigAzR7a+yE1fUF6PkK rG29At0ubz4/hTpE/IP/tTtHIy0y3XkiqzJcWYHwGWkwdpBjesV0TGcwfhJq5YjrbAW3 Nu4wlxJxDotE3T/i/5QMEE/XDQDLGWz60puGrge4gZkS7iktt0q/BGKMPEWgq9Y1hU9Q Rdbg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=PQa5L1KH6AkV1ZWZqO7SPqfKkza6UvqSXdHO3NSlWVM=; b=L9gfS50MV+/4JG4SQcJ6LXhhOaL3Pf3LmZZcJym50uHSDF0BXIgMYgNJiZOnbOiX9W WOaZR2+QVvlpaH8CFAe8rEaJBU46s9NZYXcxGxYXw3XTDdMvNxGmNRADBviK2umzsOul 0iYaObuyaraxNYqfTJ8XC4Ks36oWhYw/gW/39yFf5ZXnCIN7vOwswmIRTQFfrmwVCI5t 0cFgeM5b6cxjCCO5GAGuBsY+ANi5vi7o2Oc9n8Ch6rh+OG/RVGgw7aHOo8YdzbQVbfko VV/BfvI6zGw8VX7ac7X436nEWMzC+4LqxOPcYG4gM/JTIa/0V2g6t25B7mxcuMhDfgMc X1Sg== X-Gm-Message-State: AOAM530ICuuVrkTnjRGtckt4q2vLWRSU4cCyFFY5sinIM0JQP4f1jI1E oNEScF2VievLBUpmmxTJ67xch6CdFKg= X-Google-Smtp-Source: ABdhPJwyGxCY19InqctOJBwOFAyd9T0qW9EU8VsMBLIBg6xcjrXvQbqf7Woz2A1wlEI+pUgGCbpPGw== X-Received: by 2002:adf:ff88:: with SMTP id j8mr18729917wrr.10.1623675892364; Mon, 14 Jun 2021 06:04:52 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id h9sm20100291wmm.33.2021.06.14.06.04.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Jun 2021 06:04:52 -0700 (PDT) Message-Id: <30f0ed447683506735ecc37f553b655f23769385.1623675889.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 14 Jun 2021 13:04:41 +0000 Subject: [PATCH 03/10] diff: simplify allow-indentation-change delta calculation Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Phillip Wood , Phillip Wood Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Phillip Wood From: Phillip Wood Now that we reliably end a block when the sign changes we don't need the whitespace delta calculation to rely on the sign. Signed-off-by: Phillip Wood --- diff.c | 13 ++----------- 1 file changed, 2 insertions(+), 11 deletions(-) diff --git a/diff.c b/diff.c index a0c43a104768..19c8954ec546 100644 --- a/diff.c +++ b/diff.c @@ -864,23 +864,17 @@ static int compute_ws_delta(const struct emitted_diff_symbol *a, a_width = a->indent_width, b_off = b->indent_off, b_width = b->indent_width; - int delta; if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE) { *out = INDENT_BLANKLINE; return 1; } - if (a->s == DIFF_SYMBOL_PLUS) - delta = a_width - b_width; - else - delta = b_width - a_width; - if (a_len - a_off != b_len - b_off || memcmp(a->line + a_off, b->line + b_off, a_len - a_off)) return 0; - *out = delta; + *out = a_width - b_width; return 1; } @@ -924,10 +918,7 @@ static int cmp_in_block_with_wsd(const struct diff_options *o, * match those of the current block and that the text of 'l' and 'cur' * after the indentation match. */ - if (cur->es->s == DIFF_SYMBOL_PLUS) - delta = a_width - c_width; - else - delta = c_width - a_width; + delta = c_width - a_width; /* * If the previous lines of this block were all blank then set its From patchwork Mon Jun 14 13:04:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Phillip Wood X-Patchwork-Id: 12318923 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D493C48BE6 for ; Mon, 14 Jun 2021 13:05:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5FD126128B for ; Mon, 14 Jun 2021 13:05:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233445AbhFNNH5 (ORCPT ); Mon, 14 Jun 2021 09:07:57 -0400 Received: from mail-wr1-f50.google.com ([209.85.221.50]:38515 "EHLO mail-wr1-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233098AbhFNNH4 (ORCPT ); Mon, 14 Jun 2021 09:07:56 -0400 Received: by mail-wr1-f50.google.com with SMTP id c9so14467829wrt.5 for ; Mon, 14 Jun 2021 06:05:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:mime-version :content-transfer-encoding:fcc:to:cc; bh=7ptADnsdV5jSqQyjKSoWpUjp24y32fJyMkCdCIo/HpY=; b=p1VIYBmfc4Sfv47pzBthmbkOOfcpmMmjjdlmnxNE8ppakE34zlHYeY1T+IjRyutlGH JHRiPW6oONnzn6E6p+HBrMQujM8LGQ6NxLpYF3/f0XBxSUt0BbyRGQbXMF5jp1laHEYq k5D2sd7Je2laZa8TqEoNn1sycllECPxRcswv4oostBk6yHfdEAmlMZXgGdDNeimYGybg GvcvgsoaR5z+RuJrehUUZFYXmBQrENr7pRU4gxjJE/SjaSyzwehXTbGwRzRwvrUrVhJp zxNz0J3kFAyzSrKulS9UQsbGQVeICrm/gynsJP5Uos+jeFj/xDlbUaNOLNNP84ghF1rX jebg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:mime-version:content-transfer-encoding:fcc:to:cc; bh=7ptADnsdV5jSqQyjKSoWpUjp24y32fJyMkCdCIo/HpY=; b=HbrgHezwQPIIRKHBFZceP7hy8XHxpJ+LeKkPNpEcIX3X8B/hR7W/1EnwWcDIwtWqXV mVFMtw1o+gtJiP8MIoPhtVYa6cb9HRx5KaxledrdJihRiII6cIEIZJ9LMBJTLlHoOCgJ URobNLY4jYUUrtOvKmmPTZWN05h7KKZdv4Sa+apjYY4qt6Z38dBhwKHYE/50pFvVeO3I oKAHV12TFDuxPqZXUr1B+o+w8GDybKg5XXRNX0iWutBiPQKMiwaF6fWOtqSMXMgz/44H IzI6UrxnTzip/USwjS5R9IyiMQkiHJEeuCLiYSVh6D7I5S0jntnd5XjibfoRDji+6/Bo 5Xlg== X-Gm-Message-State: AOAM532u6HurJLgvqZwbc9bvulYBTmnMCRt3ISqhe1mN8ewDumUNu1Bl 7L8KzXGFa/2YJjQ8icIo0dZM9A+Sa+s= X-Google-Smtp-Source: ABdhPJy9ADasxiJnpMencYUDOrJ4Kjsf1pFtSF4yXOHhvmnKOFHGLjHP7jYwLxGQ5x31SJuf9baD+Q== X-Received: by 2002:adf:9031:: with SMTP id h46mr18862057wrh.125.1623675892884; Mon, 14 Jun 2021 06:04:52 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id j1sm13290825wmi.44.2021.06.14.06.04.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Jun 2021 06:04:52 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Mon, 14 Jun 2021 13:04:42 +0000 Subject: [PATCH 04/10] diff --color-moved-ws=allow-indentation-change: simplify and optimize MIME-Version: 1.0 Fcc: Sent To: git@vger.kernel.org Cc: Phillip Wood , Phillip Wood Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Phillip Wood From: Phillip Wood If we already have a block of potentially moved lines then as we move down the diff we need to check if the next line of each potentially moved line matches the current line of the diff. The implementation of --color-moved-ws=allow-indentation-change was needlessly performing this check on all the lines in the diff that matched the current line rather than just the current line. To exacerbate the problem finding all the other lines in the diff that match the current line involves a fuzzy lookup so we were wasting even more time performing a second comparison to filter out the non-matching lines. Fixing this reduces time to run git diff --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0 by 88% and simplifies the code. Before this change Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0 Time (mean ± σ): 9.978 s ± 0.042 s [User: 9.905 s, System: 0.057 s] Range (min … max): 9.917 s … 10.037 s 10 runs After this change Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0 Time (mean ± σ): 1.220 s ± 0.004 s [User: 1.160 s, System: 0.058 s] Range (min … max): 1.214 s … 1.226 s 10 runs Signed-off-by: Phillip Wood --- diff.c | 65 ++++++++++++++++------------------------------------------ 1 file changed, 18 insertions(+), 47 deletions(-) diff --git a/diff.c b/diff.c index 19c8954ec546..5d5d168107a6 100644 --- a/diff.c +++ b/diff.c @@ -881,35 +881,20 @@ static int compute_ws_delta(const struct emitted_diff_symbol *a, static int cmp_in_block_with_wsd(const struct diff_options *o, const struct moved_entry *cur, - const struct moved_entry *match, - struct moved_block *pmb, - int n) + const struct emitted_diff_symbol *l, + struct moved_block *pmb) { - struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n]; - int al = cur->es->len, bl = match->es->len, cl = l->len; + int al = cur->es->len, bl = l->len; const char *a = cur->es->line, - *b = match->es->line, - *c = l->line; + *b = l->line; int a_off = cur->es->indent_off, a_width = cur->es->indent_width, - c_off = l->indent_off, - c_width = l->indent_width; + b_off = l->indent_off, + b_width = l->indent_width; int delta; - /* - * We need to check if 'cur' is equal to 'match'. As those - * are from the same (+/-) side, we do not need to adjust for - * indent changes. However these were found using fuzzy - * matching so we do have to check if they are equal. Here we - * just check the lengths. We delay calling memcmp() to check - * the contents until later as if the length comparison for a - * and c fails we can avoid the call all together. - */ - if (al != bl) - return 1; - /* If 'l' and 'cur' are both blank then they match. */ - if (a_width == INDENT_BLANKLINE && c_width == INDENT_BLANKLINE) + if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE) return 0; /* @@ -918,7 +903,7 @@ static int cmp_in_block_with_wsd(const struct diff_options *o, * match those of the current block and that the text of 'l' and 'cur' * after the indentation match. */ - delta = c_width - a_width; + delta = b_width - a_width; /* * If the previous lines of this block were all blank then set its @@ -927,9 +912,8 @@ static int cmp_in_block_with_wsd(const struct diff_options *o, if (pmb->wsd == INDENT_BLANKLINE) pmb->wsd = delta; - return !(delta == pmb->wsd && al - a_off == cl - c_off && - !memcmp(a, b, al) && ! - memcmp(a + a_off, c + c_off, al - a_off)); + return !(delta == pmb->wsd && al - a_off == bl - b_off && + !memcmp(a + a_off, b + b_off, al - a_off)); } static int moved_entry_cmp(const void *hashmap_cmp_fn_data, @@ -1030,36 +1014,23 @@ static void pmb_advance_or_null(struct diff_options *o, } static void pmb_advance_or_null_multi_match(struct diff_options *o, - struct moved_entry *match, - struct hashmap *hm, + struct emitted_diff_symbol *l, struct moved_block *pmb, - int pmb_nr, int n) + int pmb_nr) { int i; - char *got_match = xcalloc(1, pmb_nr); - - hashmap_for_each_entry_from(hm, match, ent) { - for (i = 0; i < pmb_nr; i++) { - struct moved_entry *prev = pmb[i].match; - struct moved_entry *cur = (prev && prev->next_line) ? - prev->next_line : NULL; - if (!cur) - continue; - if (!cmp_in_block_with_wsd(o, cur, match, &pmb[i], n)) - got_match[i] |= 1; - } - } for (i = 0; i < pmb_nr; i++) { - if (got_match[i]) { + struct moved_entry *prev = pmb[i].match; + struct moved_entry *cur = (prev && prev->next_line) ? + prev->next_line : NULL; + if (cur && !cmp_in_block_with_wsd(o, cur, l, &pmb[i])) { /* Advance to the next line */ - pmb[i].match = pmb[i].match->next_line; + pmb[i].match = cur; } else { moved_block_clear(&pmb[i]); } } - - free(got_match); } static int shrink_potential_moved_blocks(struct moved_block *pmb, @@ -1181,7 +1152,7 @@ static void mark_color_as_moved(struct diff_options *o, if (o->color_moved_ws_handling & COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) - pmb_advance_or_null_multi_match(o, match, hm, pmb, pmb_nr, n); + pmb_advance_or_null_multi_match(o, l, pmb, pmb_nr); else pmb_advance_or_null(o, match, hm, pmb, pmb_nr); From patchwork Mon Jun 14 13:04:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Phillip Wood X-Patchwork-Id: 12318925 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CC9C6C48BE8 for ; Mon, 14 Jun 2021 13:05:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A8C4D6128B for ; Mon, 14 Jun 2021 13:05:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233587AbhFNNH5 (ORCPT ); Mon, 14 Jun 2021 09:07:57 -0400 Received: from mail-wr1-f49.google.com ([209.85.221.49]:46062 "EHLO mail-wr1-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233450AbhFNNH4 (ORCPT ); Mon, 14 Jun 2021 09:07:56 -0400 Received: by mail-wr1-f49.google.com with SMTP id z8so14481459wrp.12 for ; Mon, 14 Jun 2021 06:05:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:mime-version :content-transfer-encoding:fcc:to:cc; bh=C2fhi4sGIwCd6f3bzXoZbauTUlLHF9Dc66GIKqv4YcA=; b=dv9Cv8eGhHAzryKh2Zqg14RDjXgHrl+S+YiC/M5TIJBgUf+RnfAY/UWfpU8yxc88PN AxPpKbrQZ3dNr2A3FVpiMBxrdEipdnxcuITnsLULLoDic+PV43QlZZcalfBY61MhKtci 2/9XkVo7PiwIoUANjcxXNNpTMyzL51qCKlsIHnNcl8uzr/dKgurlj/ytccnjaLt8ZWKQ XRm1OI7KPkH6QkBcl6AA9eLJv/W4WOXsc63L6yqFAwv0NIGRi2tMciSL8yYY4Ij/e9z5 45N7jOjqT2V8mdLQd9NlXNPBU1MyDGDfYiWdVWcqFvZVaIr+BABcCgS2sY7Elstosyhu JTCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:mime-version:content-transfer-encoding:fcc:to:cc; bh=C2fhi4sGIwCd6f3bzXoZbauTUlLHF9Dc66GIKqv4YcA=; b=uYg2stiehwbwsTylDJ21pFR9752aXDNpK+2sOyZG2nn2HrIvTKzKfXK63ruARdBReB btTXE0YbZKXK+BK+yqMiyn5IoiAjHfGAxhchaAGuOExNYRoeAczpK4jIOo2Xp3ESHEem wCYUNlk3dlcsXG/V82iUeJmgLDcM1Q7Tz7ydbpLAk9bDF4WC6SF2L9Z7QxWsHLmvQLNy eMXu9YWWv8X3UlwMOogJvZgJHWpgoNcv1QKLAD1nMAxd/4ApLWbOdEypdeN4XFGK+P8v ks9LCsjJIgp47U2/Er4CudrFn1UTo+hhQDOkVhM/Cr9v5w5SVSufPlTKegE8XOrjh+hF aOzA== X-Gm-Message-State: AOAM531z0NmRKojSmEFJfLflAi9zAUz8pOdDc1Y63pcPcOIDaGB+QDIr imMxsuAs1it8CP6AOXWTyXRYo8ntHfc= X-Google-Smtp-Source: ABdhPJzhwhmBbutsAGHNhQs9C6/n1/CjG3v19tVpHHVXKuLrx6Zbj5pWIIlFvZMGsSw9cXQQFoURcA== X-Received: by 2002:a05:6000:1888:: with SMTP id a8mr18335583wri.11.1623675893405; Mon, 14 Jun 2021 06:04:53 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id g23sm18492319wmk.3.2021.06.14.06.04.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Jun 2021 06:04:53 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Mon, 14 Jun 2021 13:04:43 +0000 Subject: [PATCH 05/10] diff --color-moved: call comparison function directly MIME-Version: 1.0 Fcc: Sent To: git@vger.kernel.org Cc: Phillip Wood , Phillip Wood Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Phillip Wood From: Phillip Wood Calling xdiff_compare_lines() directly rather than using a function pointer from the hash map reduces the time very slightly but more importantly it will allow us to easily combine pmb_advance_or_null() and pmb_advance_or_null_multi_match() in the next commit. Before this change Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0 Time (mean ± σ): 1.136 s ± 0.004 s [User: 1.079 s, System: 0.053 s] Range (min … max): 1.130 s … 1.141 s 10 runs After this change Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0 Time (mean ± σ): 1.118 s ± 0.003 s [User: 1.062 s, System: 0.053 s] Range (min … max): 1.114 s … 1.121 s 10 runs Signed-off-by: Phillip Wood --- diff.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/diff.c b/diff.c index 5d5d168107a6..c8fdfc9049bb 100644 --- a/diff.c +++ b/diff.c @@ -995,17 +995,20 @@ static void add_lines_to_move_detection(struct diff_options *o, } static void pmb_advance_or_null(struct diff_options *o, - struct moved_entry *match, - struct hashmap *hm, + struct emitted_diff_symbol *l, struct moved_block *pmb, int pmb_nr) { int i; + unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS; + for (i = 0; i < pmb_nr; i++) { struct moved_entry *prev = pmb[i].match; struct moved_entry *cur = (prev && prev->next_line) ? prev->next_line : NULL; - if (cur && !hm->cmpfn(o, &cur->ent, &match->ent, NULL)) { + if (cur && xdiff_compare_lines(cur->es->line, cur->es->len, + l->line, l->len, + flags)) { pmb[i].match = cur; } else { pmb[i].match = NULL; @@ -1154,7 +1157,7 @@ static void mark_color_as_moved(struct diff_options *o, COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) pmb_advance_or_null_multi_match(o, l, pmb, pmb_nr); else - pmb_advance_or_null(o, match, hm, pmb, pmb_nr); + pmb_advance_or_null(o, l, pmb, pmb_nr); pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr); From patchwork Mon Jun 14 13:04:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Phillip Wood X-Patchwork-Id: 12318927 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 07E0FC2B9F4 for ; Mon, 14 Jun 2021 13:06:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AA8346124B for ; Mon, 14 Jun 2021 13:05:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233677AbhFNNIB (ORCPT ); Mon, 14 Jun 2021 09:08:01 -0400 Received: from mail-wm1-f51.google.com ([209.85.128.51]:54975 "EHLO mail-wm1-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232761AbhFNNH5 (ORCPT ); Mon, 14 Jun 2021 09:07:57 -0400 Received: by mail-wm1-f51.google.com with SMTP id m3so6863256wms.4 for ; Mon, 14 Jun 2021 06:05:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=BTvJD0RFB3UwlLw/SUZjYlXTTZadKg4MdVAfS3fgAB0=; b=RcZkKvik9cqy8CxN7Dm4hz/s0+hux8gxibf6/rxfVIaUPxTU7v4vVjXkq4o+hbY/nf akfB2tMFJG6GUbF688GvoUVjJv4lOM9+9Z9CYsD3afxGLHSX1vtCwQJBswTE91Z6jypa tGiB8V5jlNZqdyFAg0tFNxy49EIYZZfO/kJABBLC3HXBgAERoUfS/4uQj1/QiChfatR2 UaHFk5rVw1SgJ3YGGyKQv3kJDzJxhZgt6opbyt1f7CrdNcmu6NMK1Q12VuuJpf+q4kYR LjNXaFcqgLFIhu9eFLiALyLrptheeRuWTnhwTKAXe+hH+m116NLeffeHdn6SFT2Rw/gX sUwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=BTvJD0RFB3UwlLw/SUZjYlXTTZadKg4MdVAfS3fgAB0=; b=CXaUecpeBBPh7tcO99OenPNbf72LSJUcdEKIVgJTy2oDbEtCJRBzUJhQkWzxWA1LDy d9CBB7rMbKxa+VYKXvTH7MbRIRqksbmkLaQHbbx9fuEujduRRdInflLNUwUbKPlvGwO3 AhH6HUjLCGf+Hzd19lFv04PD/beAfXBqfFCXeruM6mzg6+vh0GLuADUjx60iUUeEXfJ9 Rq2imzFUoQE4h/No5kABgiVGFSKuqyMle/crQsAGIgFom90N52jtMwzDTgLGiqWY7/Sb 3X59/aXFvES/A+cmSVMnRaNHW6Qhcx6bNDiywFEPW7lyIGWWTQb2qzVkdPSDjteqh73b oE7g== X-Gm-Message-State: AOAM531nb+Xi7Q0DXHJx2b2QLJS2oIY5cr8HYs/GGcw6fpoGSDF78h4i hmuweGE5BfPOhNs0d8KR9ThyQNlo0Gk= X-Google-Smtp-Source: ABdhPJx/hdk94oO0sI25ho9nU9ZMKVEPVkWKa1IPQtStQlau1Ex2KFHV8IGi5FcXsxcwgFF9mqQplg== X-Received: by 2002:a05:600c:1c22:: with SMTP id j34mr33769690wms.166.1623675893872; Mon, 14 Jun 2021 06:04:53 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id t1sm15754076wrx.28.2021.06.14.06.04.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Jun 2021 06:04:53 -0700 (PDT) Message-Id: <050cef0081dc8252b55a4d69cca3bd08ae4eff98.1623675889.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 14 Jun 2021 13:04:44 +0000 Subject: [PATCH 06/10] diff --color-moved: unify moved block growth functions Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Phillip Wood , Phillip Wood Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Phillip Wood From: Phillip Wood After the last two commits pmb_advance_or_null() and pmb_advance_or_null_multi_match() differ only in the comparison they perform. Lets simplify the code by combining them into a single function. Signed-off-by: Phillip Wood --- diff.c | 41 ++++++++++++----------------------------- 1 file changed, 12 insertions(+), 29 deletions(-) diff --git a/diff.c b/diff.c index c8fdfc9049bb..de6522a3a860 100644 --- a/diff.c +++ b/diff.c @@ -1003,36 +1003,23 @@ static void pmb_advance_or_null(struct diff_options *o, unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS; for (i = 0; i < pmb_nr; i++) { + int match; struct moved_entry *prev = pmb[i].match; struct moved_entry *cur = (prev && prev->next_line) ? prev->next_line : NULL; - if (cur && xdiff_compare_lines(cur->es->line, cur->es->len, - l->line, l->len, - flags)) { - pmb[i].match = cur; - } else { - pmb[i].match = NULL; - } - } -} -static void pmb_advance_or_null_multi_match(struct diff_options *o, - struct emitted_diff_symbol *l, - struct moved_block *pmb, - int pmb_nr) -{ - int i; - - for (i = 0; i < pmb_nr; i++) { - struct moved_entry *prev = pmb[i].match; - struct moved_entry *cur = (prev && prev->next_line) ? - prev->next_line : NULL; - if (cur && !cmp_in_block_with_wsd(o, cur, l, &pmb[i])) { - /* Advance to the next line */ + if (o->color_moved_ws_handling & + COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) + match = cur && + !cmp_in_block_with_wsd(o, cur, l, &pmb[i]); + else + match = cur && + xdiff_compare_lines(cur->es->line, cur->es->len, + l->line, l->len, flags); + if (match) pmb[i].match = cur; - } else { + else moved_block_clear(&pmb[i]); - } } } @@ -1153,11 +1140,7 @@ static void mark_color_as_moved(struct diff_options *o, continue; } - if (o->color_moved_ws_handling & - COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) - pmb_advance_or_null_multi_match(o, l, pmb, pmb_nr); - else - pmb_advance_or_null(o, l, pmb, pmb_nr); + pmb_advance_or_null(o, l, pmb, pmb_nr); pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr); From patchwork Mon Jun 14 13:04:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Phillip Wood X-Patchwork-Id: 12318915 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD466C2B9F4 for ; Mon, 14 Jun 2021 13:05:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B7D2261283 for ; Mon, 14 Jun 2021 13:05:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233688AbhFNNHG (ORCPT ); Mon, 14 Jun 2021 09:07:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37482 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233445AbhFNNG6 (ORCPT ); Mon, 14 Jun 2021 09:06:58 -0400 Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C7A21C061574 for ; Mon, 14 Jun 2021 06:04:55 -0700 (PDT) Received: by mail-wr1-x42c.google.com with SMTP id l2so14516735wrw.6 for ; Mon, 14 Jun 2021 06:04:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=d0ZRf0yV2pYUnC+xy6L0k5B18+nFxkuXYdpzbS0ZhZs=; b=veD7GyDVj7O0sVSq3Ls/IrTx4LQEeN8qxciBBz6FScF//hFUR6zSY6eMyJTG0BbsOr POPFSaYGl+W4j1ImP2R0CAYiY5xlcsJzI5dUYU0GY9uk0RABTU0hDFv4SLHsNW6hZzpn 03uG7S2Aftpu0kIctt75OjtcrC42gwj5zI0l+8xHd3zx0PoOGguNnpt7nCefHDRH5099 AypzFCZCdku8Ka3pnXYUTVTrXUd80/81LLT4ogcFgZ3+TFDCxstfJfszqcV8E9xa/eB7 BmDnPZLuDwFmHN6h8YkV6dEayytk60w7O/9tFpKUdpmGxWQLC/5Elkb1kWDKxMXFafqu xpAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=d0ZRf0yV2pYUnC+xy6L0k5B18+nFxkuXYdpzbS0ZhZs=; b=tdFckSNM83prPne+HHlgx4N2O+nygPm1vxUjp92Qp+gOtAjIXLT8rS5KRMyxqLV/+/ en9HbccYcbn2dkEWdJhKoRUeNeUh8GbjNeKHqD+n3VZa0fDYyK9MzxyNXAut6fRSpckQ r2s7CRU9geIXMrPLBHDPtcUvBQnn/E17juZTAdrgF9g8RWcN3a7Z+vwC2q+mnX2HjeLM PaK8h8XqF8m8cxg3Z1giu9bXsSKZD34EVHXERxjQz5YRpCD9u7L4Rg9pRIWT2O6jdNqx 1zj/X6oYFTRFQKqpiHAN6IEpXvjnR6p2xey0bgQ2GLuvR1YKOV7q6SRBETPQ0BdRmugW 0V8g== X-Gm-Message-State: AOAM530D93QARHB2SBJB9f1rJFNUmO8u/DHMGCHIHDSEOw39d2gZtwOa tVyZynFT35bE4MH/HrmMRYgnN6y0LaY= X-Google-Smtp-Source: ABdhPJwIdynJ+ML8Q3xYkjkkZMOmG23TlrgSyGT6MJgwqfEBdMPjKY9eohTRX7J6ALVurj9/tz1eAA== X-Received: by 2002:adf:b650:: with SMTP id i16mr18801886wre.205.1623675894446; Mon, 14 Jun 2021 06:04:54 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id f13sm16325470wrt.86.2021.06.14.06.04.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Jun 2021 06:04:54 -0700 (PDT) Message-Id: <9390e9a66ebf3c2ccd2b47958134dee8c3c9c553.1623675889.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 14 Jun 2021 13:04:45 +0000 Subject: [PATCH 07/10] diff --color-moved: shrink potential moved blocks as we go Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Phillip Wood , Phillip Wood Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Phillip Wood From: Phillip Wood Rather than setting `match` to NULL and then looping over the list of potential matched blocks for a second time to remove blocks with no matches just filter out the blocks with no matches as we go. Signed-off-by: Phillip Wood --- diff.c | 42 ++++++------------------------------------ 1 file changed, 6 insertions(+), 36 deletions(-) diff --git a/diff.c b/diff.c index de6522a3a860..f60cce654c14 100644 --- a/diff.c +++ b/diff.c @@ -997,12 +997,12 @@ static void add_lines_to_move_detection(struct diff_options *o, static void pmb_advance_or_null(struct diff_options *o, struct emitted_diff_symbol *l, struct moved_block *pmb, - int pmb_nr) + int *pmb_nr) { - int i; + int i, j; unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS; - for (i = 0; i < pmb_nr; i++) { + for (i = 0, j = 0; i < *pmb_nr; i++) { int match; struct moved_entry *prev = pmb[i].match; struct moved_entry *cur = (prev && prev->next_line) ? @@ -1017,37 +1017,9 @@ static void pmb_advance_or_null(struct diff_options *o, xdiff_compare_lines(cur->es->line, cur->es->len, l->line, l->len, flags); if (match) - pmb[i].match = cur; - else - moved_block_clear(&pmb[i]); + pmb[j++].match = cur; } -} - -static int shrink_potential_moved_blocks(struct moved_block *pmb, - int pmb_nr) -{ - int lp, rp; - - /* Shrink the set of potential block to the remaining running */ - for (lp = 0, rp = pmb_nr - 1; lp <= rp;) { - while (lp < pmb_nr && pmb[lp].match) - lp++; - /* lp points at the first NULL now */ - - while (rp > -1 && !pmb[rp].match) - rp--; - /* rp points at the last non-NULL */ - - if (lp < pmb_nr && rp > -1 && lp < rp) { - pmb[lp] = pmb[rp]; - memset(&pmb[rp], 0, sizeof(pmb[rp])); - rp--; - lp++; - } - } - - /* Remember the number of running sets */ - return rp + 1; + *pmb_nr = j; } /* @@ -1140,9 +1112,7 @@ static void mark_color_as_moved(struct diff_options *o, continue; } - pmb_advance_or_null(o, l, pmb, pmb_nr); - - pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr); + pmb_advance_or_null(o, l, pmb, &pmb_nr); if (pmb_nr == 0) { /* From patchwork Mon Jun 14 13:04:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Phillip Wood X-Patchwork-Id: 12318931 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9397DC2B9F4 for ; Mon, 14 Jun 2021 13:06:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7CE3161244 for ; Mon, 14 Jun 2021 13:06:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233835AbhFNNIS (ORCPT ); Mon, 14 Jun 2021 09:08:18 -0400 Received: from mail-wr1-f53.google.com ([209.85.221.53]:43615 "EHLO mail-wr1-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233435AbhFNNIO (ORCPT ); Mon, 14 Jun 2021 09:08:14 -0400 Received: by mail-wr1-f53.google.com with SMTP id r9so14476170wrz.10 for ; Mon, 14 Jun 2021 06:05:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=P792OCTN1CeAgO2b7hqLZt7xDjU0IG1sftD1YJbGH8E=; b=vKN2c5xfTSSR2tLYfwzjPBF4cBLcbQJuAzq7ACmGYJBJfSc5UNsESgIT218FQxsbjf s028DJPUqc68pTThRtO/3+NFawRwGR/pV6yrn3lv7MhG7dQ8D6x8P0sBKEW0ce5H+Hf6 N3HV8vRbTgwBJc3XCRpb9DQhGLPJdH8rpgE5ipjcceVdB0wb03C/R70cOzQns1dW/M/s TuFjycezzy0etTp9WZCzm25pLwm2WjT/+i9ZzgyFXwyjX9PZW42UJln9lrjSR+ZcSSxi n82wP2lm1bndAkcK21V645JxYyac43YIO4WojH3QisGnERAZ9vhhknZJdLLRksw0cPCE N1DQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=P792OCTN1CeAgO2b7hqLZt7xDjU0IG1sftD1YJbGH8E=; b=jux610SgQ0NW9cmDl+cYw8fZkAf7dpLen5XZp077tzwS3ueg2cR8wm6Lt/JnBg1cCz D9+n42u9+TtGV/qQkh8S/Baziohm2NEgoI1i8wiAwg0ptSHR8ppSn8WAOYQQakYy6NgC z51lP8/3xTUoRl+jjxdohysHXn/VuTnB+ODabbBad3FwNjJUCYez8tU0WjMq2CZAx62R z23PZIWz+1BjWfQDvnUYK6rSqQRcCfBa0X24rDulI3JL2IyoEWDbJ3rI0qaIBlHAmNgP UncUPsYtZgECG9/hgslGp6eb1Ykv+rqgfuMYihV2bmdjmPofby7RrRL8AZI4ij/Hely2 kqHQ== X-Gm-Message-State: AOAM531XUHC6EJGafo7JXXn3nv5OjhN6wME3fJTB1miczAPp6guhNLmn ZCs6XvEFm1ivDdAsFq6b+E+Yun2Ayw4= X-Google-Smtp-Source: ABdhPJwkc63PBzKHAJOlZ3vYLEIzT/lfvm6YRY7heb/WZTq4YqDzg3xVyB4oSSrsCYuIhIGvgUp63g== X-Received: by 2002:adf:f78d:: with SMTP id q13mr18545595wrp.191.1623675894977; Mon, 14 Jun 2021 06:04:54 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id c12sm18543187wrr.90.2021.06.14.06.04.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Jun 2021 06:04:54 -0700 (PDT) Message-Id: <1de99ac2bc3c60bc4639687d5ad2e2638aa9cadb.1623675889.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 14 Jun 2021 13:04:46 +0000 Subject: [PATCH 08/10] diff --color-moved: stop clearing potential moved blocks Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Phillip Wood , Phillip Wood Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Phillip Wood From: Phillip Wood moved_block_clear() was introduced in 74d156f4a1 ("diff --color-moved-ws: fix double free crash", 2018-10-04) to free the memory that was allocated when initializing a potential moved block. However since 21536d077f ("diff --color-moved-ws: modify allow-indentation-change", 2018-11-23) initializing a potential moved block no longer allocates any memory. Up until the last commit we were relying on moved_block_clear() to set the `match` pointer to NULL when a block stopped matching, but since that commit we do not clear a moved block that does not match so it does not make sense to clear them elsewhere. Signed-off-by: Phillip Wood --- diff.c | 11 ----------- 1 file changed, 11 deletions(-) diff --git a/diff.c b/diff.c index f60cce654c14..ee58373f55f8 100644 --- a/diff.c +++ b/diff.c @@ -807,11 +807,6 @@ struct moved_block { int wsd; /* The whitespace delta of this block */ }; -static void moved_block_clear(struct moved_block *b) -{ - memset(b, 0, sizeof(*b)); -} - #define INDENT_BLANKLINE INT_MIN static void fill_es_indent_data(struct emitted_diff_symbol *es) @@ -1093,11 +1088,7 @@ static void mark_color_as_moved(struct diff_options *o, } if (pmb_nr && (!match || l->s != moved_symbol)) { - int i; - adjust_last_block(o, n, block_length); - for(i = 0; i < pmb_nr; i++) - moved_block_clear(&pmb[i]); pmb_nr = 0; block_length = 0; flipped_block = 0; @@ -1155,8 +1146,6 @@ static void mark_color_as_moved(struct diff_options *o, } adjust_last_block(o, n, block_length); - for(n = 0; n < pmb_nr; n++) - moved_block_clear(&pmb[n]); free(pmb); } From patchwork Mon Jun 14 13:04:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Phillip Wood X-Patchwork-Id: 12318917 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93531C48BE6 for ; Mon, 14 Jun 2021 13:05:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7348C61283 for ; Mon, 14 Jun 2021 13:05:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233463AbhFNNHI (ORCPT ); Mon, 14 Jun 2021 09:07:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37490 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233532AbhFNNG7 (ORCPT ); Mon, 14 Jun 2021 09:06:59 -0400 Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D9247C061767 for ; Mon, 14 Jun 2021 06:04:56 -0700 (PDT) Received: by mail-wr1-x42f.google.com with SMTP id l2so14516792wrw.6 for ; Mon, 14 Jun 2021 06:04:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:mime-version :content-transfer-encoding:fcc:to:cc; bh=Owa0Wj5cHYR3NtWykogb2MAtPFNqZSpM0GNud+2c33w=; b=gvE06SrhNoSa3z8jJ1PYRZktdJQuWSzLexQsfGFeEorzJNESW96ON4loQLGn92EyN4 vl9UBI2ixPkNDMpAYuECFqsIaHAAedFWRbSHbuhExTm7DnRJJqAlt/zNrcvxRAfeMiCU 1DpnK8/Ao4/RzD/KPEAZbM55vwvc9DSKHVvxD3M5qUgJx3pPvAdeVLme72Ra4tgOSd/u SkbhyM8/AzR8gRoKc1SLFrMcTh9SNg2LveT9lJWhv1axxVD9gWhuN5v8xffnENZO/Y7V E09JVIgqPpYlvM7S6Tec05Uy0vrtvJFFAokXyVvdSOwaX/PSLsihUY976o3BWnrOYraG b8fg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:mime-version:content-transfer-encoding:fcc:to:cc; bh=Owa0Wj5cHYR3NtWykogb2MAtPFNqZSpM0GNud+2c33w=; b=FDiF9+yBUEufmu63RorL9jAIa625DJCjGJcq0gwgZzupBDSC2YN7GrcWdM4RE6YEf6 i2TmnBgG0k4pJ81Ac4oxt6/IsbTVmo3RVubv99yfeaRDa6X82KbSttHsrPgOpW5EGMmb 4gYj5eTL5nhakEdCnJYQFdaUurlJM72gLqwI2uczd0IQiDpde0Rmk+E5N7v+V6qOcZnC EzpQtdAOvDhNdXsiXcvrJNNHVmJAUdJb/n0fec11AdZ7mh5vzILjqjGhl2N+wppsyu5V 1yFC61eOP6/BeWUJpZMACn+DfAR3rj6dBbnX58SfBcH8W04xylvygOQQS+fOgrV7qf/t R3aw== X-Gm-Message-State: AOAM533yXA0T74KaZSfuTYwWrprDp1sfW8EA7kOXbbQCMJwkHs+bJAJF qd+t4e5Rtcxax/5HiJDqRfK9hb5gkes= X-Google-Smtp-Source: ABdhPJwZVx2mphKURRdfxEk4Dw2SrDqRjWb2bEeP7eigKiSRrgC7EEgmFCuNviXBI8H2kN+ObVtefQ== X-Received: by 2002:a5d:474f:: with SMTP id o15mr18944346wrs.298.1623675895562; Mon, 14 Jun 2021 06:04:55 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id s62sm20372162wms.13.2021.06.14.06.04.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Jun 2021 06:04:55 -0700 (PDT) Message-Id: <41cdedd60907b966dffa6cf0c9825ffb448f4971.1623675889.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 14 Jun 2021 13:04:47 +0000 Subject: [PATCH 09/10] diff --color-moved-ws=allow-indentation-change: improve hash lookups MIME-Version: 1.0 Fcc: Sent To: git@vger.kernel.org Cc: Phillip Wood , Phillip Wood Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Phillip Wood From: Phillip Wood As libxdiff does not have a whitespace flag to ignore the indentation the code for --color-moved-ws=allow-indentation-change uses XDF_IGNORE_WHITESPACE and then filters out any hash lookups where there are non-indentation changes. This is filtering is inefficient as we have to perform another string comparison. By using the offset data that we have already computed to skip the indentation we can avoid using XDF_IGNORE_WHITESPACE and safely remove the extra checks which improves the performance by 14% and paves the way for the elimination of string comparisons in the next commit. This change slightly increases the runtime of other --color-moved modes. This could be avoided by using different comparison functions for the different modes but after the changes in the next commit there is no measurable benefit. Before this change Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0 Time (mean ± σ): 1.116 s ± 0.005 s [User: 1.057 s, System: 0.056 s] Range (min … max): 1.109 s … 1.123 s 10 runs Benchmark #2: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0 Time (mean ± σ): 1.216 s ± 0.005 s [User: 1.155 s, System: 0.059 s] Range (min … max): 1.206 s … 1.223 s 10 runs After this change Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0 Time (mean ± σ): 1.147 s ± 0.005 s [User: 1.085 s, System: 0.059 s] Range (min … max): 1.140 s … 1.154 s 10 runs Benchmark #2: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0 Time (mean ± σ): 1.048 s ± 0.005 s [User: 987.4 ms, System: 58.8 ms] Range (min … max): 1.043 s … 1.056 s 10 runs Signed-off-by: Phillip Wood --- diff.c | 66 +++++++++++++++++----------------------------------------- 1 file changed, 19 insertions(+), 47 deletions(-) diff --git a/diff.c b/diff.c index ee58373f55f8..e6f3586b39bf 100644 --- a/diff.c +++ b/diff.c @@ -850,28 +850,15 @@ static void fill_es_indent_data(struct emitted_diff_symbol *es) } static int compute_ws_delta(const struct emitted_diff_symbol *a, - const struct emitted_diff_symbol *b, - int *out) -{ - int a_len = a->len, - b_len = b->len, - a_off = a->indent_off, - a_width = a->indent_width, - b_off = b->indent_off, + const struct emitted_diff_symbol *b) +{ + int a_width = a->indent_width, b_width = b->indent_width; - if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE) { - *out = INDENT_BLANKLINE; - return 1; - } - - if (a_len - a_off != b_len - b_off || - memcmp(a->line + a_off, b->line + b_off, a_len - a_off)) - return 0; - - *out = a_width - b_width; + if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE) + return INDENT_BLANKLINE; - return 1; + return a_width - b_width; } static int cmp_in_block_with_wsd(const struct diff_options *o, @@ -917,26 +904,17 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data, const void *keydata) { const struct diff_options *diffopt = hashmap_cmp_fn_data; - const struct moved_entry *a, *b; + const struct emitted_diff_symbol *a, *b; unsigned flags = diffopt->color_moved_ws_handling & XDF_WHITESPACE_FLAGS; - a = container_of(eptr, const struct moved_entry, ent); - b = container_of(entry_or_key, const struct moved_entry, ent); - - if (diffopt->color_moved_ws_handling & - COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) - /* - * As there is not specific white space config given, - * we'd need to check for a new block, so ignore all - * white space. The setup of the white space - * configuration for the next block is done else where - */ - flags |= XDF_IGNORE_WHITESPACE; + a = container_of(eptr, const struct moved_entry, ent)->es; + b = container_of(entry_or_key, const struct moved_entry, ent)->es; - return !xdiff_compare_lines(a->es->line, a->es->len, - b->es->line, b->es->len, - flags); + return !xdiff_compare_lines(a->line + a->indent_off, + a->len - a->indent_off, + b->line + b->indent_off, + b->len - b->indent_off, flags); } static struct moved_entry *prepare_entry(struct diff_options *o, @@ -945,7 +923,8 @@ static struct moved_entry *prepare_entry(struct diff_options *o, struct moved_entry *ret = xmalloc(sizeof(*ret)); struct emitted_diff_symbol *l = &o->emitted_symbols->buf[line_no]; unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS; - unsigned int hash = xdiff_hash_string(l->line, l->len, flags); + unsigned int hash = xdiff_hash_string(l->line + l->indent_off, + l->len - l->indent_off, flags); hashmap_entry_init(&ret->ent, hash); ret->es = l; @@ -1113,14 +1092,11 @@ static void mark_color_as_moved(struct diff_options *o, hashmap_for_each_entry_from(hm, match, ent) { ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc); if (o->color_moved_ws_handling & - COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) { - if (compute_ws_delta(l, match->es, - &pmb[pmb_nr].wsd)) - pmb[pmb_nr++].match = match; - } else { + COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) + pmb[pmb_nr].wsd = compute_ws_delta(l, match->es); + else pmb[pmb_nr].wsd = 0; - pmb[pmb_nr++].match = match; - } + pmb[pmb_nr++].match = match; } if (adjust_last_block(o, n, block_length) && @@ -6240,10 +6216,6 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o) if (o->color_moved) { struct hashmap add_lines, del_lines; - if (o->color_moved_ws_handling & - COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) - o->color_moved_ws_handling |= XDF_IGNORE_WHITESPACE; - hashmap_init(&del_lines, moved_entry_cmp, o, 0); hashmap_init(&add_lines, moved_entry_cmp, o, 0); From patchwork Mon Jun 14 13:04:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Phillip Wood X-Patchwork-Id: 12318929 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7A52C4743C for ; Mon, 14 Jun 2021 13:06:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A43FD6124B for ; Mon, 14 Jun 2021 13:06:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233746AbhFNNIC (ORCPT ); Mon, 14 Jun 2021 09:08:02 -0400 Received: from mail-wr1-f49.google.com ([209.85.221.49]:37519 "EHLO mail-wr1-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233450AbhFNNH7 (ORCPT ); Mon, 14 Jun 2021 09:07:59 -0400 Received: by mail-wr1-f49.google.com with SMTP id i94so14449236wri.4 for ; Mon, 14 Jun 2021 06:05:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:mime-version :content-transfer-encoding:fcc:to:cc; bh=HacJdtXP0AcquE6qAwsWi9BJXoS7CDHQiN0WsBsx2Yo=; b=Rgpl67HjJbttKtzkp5PlNpCZbxA9AMVv3iNtTG/KbZDp0oULjAzTkkPL5zgutHKSel SKyKWItO0QRk+At9MKynYmsd+qZOX9c0cCBCZ2o3E2x9rPbIu88DOnceWqXf7F/BElFS LA4QkCLYKR7CqXEQOLMZqmJUR8a8flt80M6vu0UhQrkykl4ChkjwvpnGCOMEF6YWcfAm Nmeg2v/x3lEfHc9RpRVgRMsAJgxZdAkvL0C6yMLKlMD28slAyjUzhk+bs9Rapy2eFQmE KkGNtJWIO5vFmqfb8lBiMbdPfk1wxU6juaq/tw0WKlUsBmBmt9zHQHatf5zcNABM+/DZ CDvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:mime-version:content-transfer-encoding:fcc:to:cc; bh=HacJdtXP0AcquE6qAwsWi9BJXoS7CDHQiN0WsBsx2Yo=; b=hBF70N4RyAFWckxsZ7YsB7OLbzUjs7QOa5khynm6obCqjLvId0mlEXKKxN6hPqdnch oDuivbpmFhcc321Pe5AjE0H5XSmJ37tC1C19RJ6J2BmOAUzO7pSOHdZff2dBgSv8tGZ0 vLBBkXeozzCQkk4gxue49U/awD8VLR+noTjn0R1oUeP1h9eHR9GyIcVUBBpMT5o6kqvw sA08EybgQ0JQbK3tNuUjFttcf2ttTblr/pXjwkPFQ/dZfEb/fmohC8gNCBioLWjFbUV0 gyK+ILs8C8mk7GpSJcWDtDVXoXPhkhFKBYLzTnPQRG9xMjlCZ5GTDBvGiyUq4JWYz99n oJyQ== X-Gm-Message-State: AOAM531cHN7ZhCWHXQHnVnx5v4WoMyqOOUT0Z687MDNPL0blF9hGRVwI OGPy/FXa/8otbA1E0bfdWeDP8kSS3Qg= X-Google-Smtp-Source: ABdhPJwmRz1/ItEiPKi8ypxTebtjQFPCqtOJ1wG0lZmy/dhstuSwPrHC6B6foCDOIXP8UDOO0+Omkw== X-Received: by 2002:adf:f7d2:: with SMTP id a18mr19299919wrq.111.1623675896125; Mon, 14 Jun 2021 06:04:56 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id p15sm8472332wmq.43.2021.06.14.06.04.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Jun 2021 06:04:55 -0700 (PDT) Message-Id: <220664dd907ed5e2183722fa2e1877f62c7d762a.1623675889.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 14 Jun 2021 13:04:48 +0000 Subject: [PATCH 10/10] diff --color-moved: intern strings MIME-Version: 1.0 Fcc: Sent To: git@vger.kernel.org Cc: Phillip Wood , Phillip Wood Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Phillip Wood From: Phillip Wood Taking inspiration from xdl_classify_record() assign an id to each addition and deletion such that lines that match for the current --color-moved-ws mode share the same unique id. This reduces the number of hash lookups a little (calculating the ids still involves one hash lookup per line) but the main benefit is that when growing blocks of potentially moved lines we can replace string comparisons which involve chasing a pointer with a simple integer comparison. On a large diff this commit reduces the time to run 'diff --color-moved' by 33% and 'diff --color-moved-ws=allow-indentation-change' by 20%. Compared to master the time to run 'git log --patch --color-moved' is increased by 2% and 'git log --patch --color-moved-ws=allow-indentation-change' in reduced by 14%. These timings were performed on an i5-7200U, on an i5-3470 both commands are faster than master. The small speed decrease on commit sized diffs is unfortunate but I think it is small enough to be worth it for the gains on larger diffs. Large diff before this change: Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0 Time (mean ± σ): 1.147 s ± 0.005 s [User: 1.085 s, System: 0.059 s] Range (min … max): 1.140 s … 1.154 s 10 runs Benchmark #2: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0 Time (mean ± σ): 1.048 s ± 0.005 s [User: 987.4 ms, System: 58.8 ms] Range (min … max): 1.043 s … 1.056 s 10 runs Large diff after this change Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0 Time (mean ± σ): 762.7 ms ± 2.8 ms [User: 707.5 ms, System: 53.7 ms] Range (min … max): 758.0 ms … 767.0 ms 10 runs Benchmark #2: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0 Time (mean ± σ): 831.7 ms ± 1.7 ms [User: 776.5 ms, System: 53.3 ms] Range (min … max): 829.2 ms … 835.1 ms 10 runs Small diffs on master Benchmark #1: bin-wrappers/git log -p --diff-algorithm=myers --color-moved --no-color-moved-ws --no-merges -n1000 v2.29.0 Time (mean ± σ): 1.567 s ± 0.001 s [User: 1.443 s, System: 0.121 s] Range (min … max): 1.566 s … 1.571 s 10 runs Benchmark #2: bin-wrappers/git log -p --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change -n1000 --no-merges v2.29.0 Time (mean ± σ): 1.865 s ± 0.008 s [User: 1.748 s, System: 0.112 s] Range (min … max): 1.857 s … 1.881 s 10 runs Small diffs after this change Benchmark #1: bin-wrappers/git log -p --diff-algorithm=myers --color-moved --no-color-moved-ws --no-merges -n1000 v2.29.0 Time (mean ± σ): 1.597 s ± 0.003 s [User: 1.413 s, System: 0.179 s] Range (min … max): 1.591 s … 1.601 s 10 runs Benchmark #2: bin-wrappers/git log -p --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change -n1000 --no-merges v2.29.0 Time (mean ± σ): 1.606 s ± 0.006 s [User: 1.420 s, System: 0.181 s] Range (min … max): 1.601 s … 1.622 s 10 runs Signed-off-by: Phillip Wood --- diff.c | 173 ++++++++++++++++++++++++++++++++------------------------- 1 file changed, 96 insertions(+), 77 deletions(-) diff --git a/diff.c b/diff.c index e6f3586b39bf..3260e2c60591 100644 --- a/diff.c +++ b/diff.c @@ -18,6 +18,7 @@ #include "submodule-config.h" #include "submodule.h" #include "hashmap.h" +#include "mem-pool.h" #include "ll-merge.h" #include "string-list.h" #include "strvec.h" @@ -772,6 +773,7 @@ struct emitted_diff_symbol { int flags; int indent_off; /* Offset to first non-whitespace character */ int indent_width; /* The visual width of the indentation */ + unsigned id; enum diff_symbol s; }; #define EMITTED_DIFF_SYMBOL_INIT {NULL} @@ -797,9 +799,9 @@ static void append_emitted_diff_symbol(struct diff_options *o, } struct moved_entry { - struct hashmap_entry ent; const struct emitted_diff_symbol *es; struct moved_entry *next_line; + struct moved_entry *next_match; }; struct moved_block { @@ -866,24 +868,24 @@ static int cmp_in_block_with_wsd(const struct diff_options *o, const struct emitted_diff_symbol *l, struct moved_block *pmb) { - int al = cur->es->len, bl = l->len; - const char *a = cur->es->line, - *b = l->line; - int a_off = cur->es->indent_off, - a_width = cur->es->indent_width, - b_off = l->indent_off, - b_width = l->indent_width; + int a_width = cur->es->indent_width, b_width = l->indent_width; int delta; - /* If 'l' and 'cur' are both blank then they match. */ - if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE) + /* The text of each line must match */ + if (cur->es->id != l->id) + return 1; + + /* + * If 'l' and 'cur' are both blank then we don't need to check the + * indent. We only need to check cur as we know the strings match. + * */ + if (a_width == INDENT_BLANKLINE) return 0; /* * The indent changes of the block are known and stored in pmb->wsd; * however we need to check if the indent changes of the current line - * match those of the current block and that the text of 'l' and 'cur' - * after the indentation match. + * match those of the current block. */ delta = b_width - a_width; @@ -894,22 +896,26 @@ static int cmp_in_block_with_wsd(const struct diff_options *o, if (pmb->wsd == INDENT_BLANKLINE) pmb->wsd = delta; - return !(delta == pmb->wsd && al - a_off == bl - b_off && - !memcmp(a + a_off, b + b_off, al - a_off)); + return delta != pmb->wsd; } -static int moved_entry_cmp(const void *hashmap_cmp_fn_data, - const struct hashmap_entry *eptr, - const struct hashmap_entry *entry_or_key, - const void *keydata) +struct interned_diff_symbol { + struct hashmap_entry ent; + struct emitted_diff_symbol *es; +}; + +static int interned_diff_symbol_cmp(const void *hashmap_cmp_fn_data, + const struct hashmap_entry *eptr, + const struct hashmap_entry *entry_or_key, + const void *keydata) { const struct diff_options *diffopt = hashmap_cmp_fn_data; const struct emitted_diff_symbol *a, *b; unsigned flags = diffopt->color_moved_ws_handling & XDF_WHITESPACE_FLAGS; - a = container_of(eptr, const struct moved_entry, ent)->es; - b = container_of(entry_or_key, const struct moved_entry, ent)->es; + a = container_of(eptr, const struct interned_diff_symbol, ent)->es; + b = container_of(entry_or_key, const struct interned_diff_symbol, ent)->es; return !xdiff_compare_lines(a->line + a->indent_off, a->len - a->indent_off, @@ -917,55 +923,81 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data, b->len - b->indent_off, flags); } -static struct moved_entry *prepare_entry(struct diff_options *o, - int line_no) +static void prepare_entry(struct diff_options *o, struct emitted_diff_symbol *l, + struct interned_diff_symbol *s) { - struct moved_entry *ret = xmalloc(sizeof(*ret)); - struct emitted_diff_symbol *l = &o->emitted_symbols->buf[line_no]; unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS; unsigned int hash = xdiff_hash_string(l->line + l->indent_off, l->len - l->indent_off, flags); - hashmap_entry_init(&ret->ent, hash); - ret->es = l; - ret->next_line = NULL; - - return ret; + hashmap_entry_init(&s->ent, hash); + s->es = l; } -static void add_lines_to_move_detection(struct diff_options *o, - struct hashmap *add_lines, - struct hashmap *del_lines) +struct moved_entry_list { + struct moved_entry *add, *del; +}; + +static struct moved_entry_list *add_lines_to_move_detection(struct diff_options *o, + struct mem_pool *entry_mem_pool) { struct moved_entry *prev_line = NULL; - + struct mem_pool interned_pool; + struct hashmap interned_map; + struct moved_entry_list *entry_list = NULL; + size_t entry_list_alloc = 0; + unsigned id = 0; int n; + + hashmap_init(&interned_map, interned_diff_symbol_cmp, o, 8096); + mem_pool_init(&interned_pool, 1024 * 1024); + for (n = 0; n < o->emitted_symbols->nr; n++) { - struct hashmap *hm; - struct moved_entry *key; + struct interned_diff_symbol key; + struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n]; + struct interned_diff_symbol *s; + struct moved_entry *entry; - switch (o->emitted_symbols->buf[n].s) { - case DIFF_SYMBOL_PLUS: - hm = add_lines; - break; - case DIFF_SYMBOL_MINUS: - hm = del_lines; - break; - default: + if (l->s != DIFF_SYMBOL_PLUS && l->s != DIFF_SYMBOL_MINUS) { prev_line = NULL; continue; } if (o->color_moved_ws_handling & COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) - fill_es_indent_data(&o->emitted_symbols->buf[n]); - key = prepare_entry(o, n); - if (prev_line && prev_line->es->s == o->emitted_symbols->buf[n].s) - prev_line->next_line = key; + fill_es_indent_data(l); - hashmap_add(hm, &key->ent); - prev_line = key; + prepare_entry(o, l, &key); + s = hashmap_get_entry(&interned_map, &key, ent, &key.ent); + if (s) { + l->id = s->es->id; + } else { + l->id = id; + ALLOC_GROW_BY(entry_list, id, 1, entry_list_alloc); + hashmap_add(&interned_map, + memcpy(mem_pool_alloc(&interned_pool, + sizeof(key)), + &key, sizeof(key))); + } + entry = mem_pool_alloc(entry_mem_pool, sizeof(*entry)); + entry->es = l; + entry->next_line = NULL; + if (prev_line && prev_line->es->s == l->s) + prev_line->next_line = entry; + prev_line = entry; + if (l->s == DIFF_SYMBOL_PLUS) { + entry->next_match = entry_list[l->id].add; + entry_list[l->id].add = entry; + } else { + entry->next_match = entry_list[l->id].del; + entry_list[l->id].del = entry; + } } + + hashmap_clear(&interned_map); + mem_pool_discard(&interned_pool, 0); + + return entry_list; } static void pmb_advance_or_null(struct diff_options *o, @@ -974,7 +1006,6 @@ static void pmb_advance_or_null(struct diff_options *o, int *pmb_nr) { int i, j; - unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS; for (i = 0, j = 0; i < *pmb_nr; i++) { int match; @@ -987,9 +1018,8 @@ static void pmb_advance_or_null(struct diff_options *o, match = cur && !cmp_in_block_with_wsd(o, cur, l, &pmb[i]); else - match = cur && - xdiff_compare_lines(cur->es->line, cur->es->len, - l->line, l->len, flags); + match = cur && cur->es->id == l->id; + if (match) pmb[j++].match = cur; } @@ -1034,8 +1064,7 @@ static int adjust_last_block(struct diff_options *o, int n, int block_length) /* Find blocks of moved code, delegate actual coloring decision to helper */ static void mark_color_as_moved(struct diff_options *o, - struct hashmap *add_lines, - struct hashmap *del_lines) + struct moved_entry_list *entry_list) { struct moved_block *pmb = NULL; /* potentially moved blocks */ int pmb_nr = 0, pmb_alloc = 0; @@ -1044,23 +1073,15 @@ static void mark_color_as_moved(struct diff_options *o, for (n = 0; n < o->emitted_symbols->nr; n++) { - struct hashmap *hm = NULL; - struct moved_entry *key; struct moved_entry *match = NULL; struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n]; switch (l->s) { case DIFF_SYMBOL_PLUS: - hm = del_lines; - key = prepare_entry(o, n); - match = hashmap_get_entry(hm, key, ent, NULL); - free(key); + match = entry_list[l->id].del; break; case DIFF_SYMBOL_MINUS: - hm = add_lines; - key = prepare_entry(o, n); - match = hashmap_get_entry(hm, key, ent, NULL); - free(key); + match = entry_list[l->id].add; break; default: flipped_block = 0; @@ -1089,7 +1110,7 @@ static void mark_color_as_moved(struct diff_options *o, * The current line is the start of a new block. * Setup the set of potential blocks. */ - hashmap_for_each_entry_from(hm, match, ent) { + for (; match; match = match->next_match) { ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc); if (o->color_moved_ws_handling & COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) @@ -1460,7 +1481,7 @@ static void emit_diff_symbol_from_struct(struct diff_options *o, static void emit_diff_symbol(struct diff_options *o, enum diff_symbol s, const char *line, int len, unsigned flags) { - struct emitted_diff_symbol e = {line, len, flags, 0, 0, s}; + struct emitted_diff_symbol e = {line, len, flags, 0, 0, 0, s}; if (o->emitted_symbols) append_emitted_diff_symbol(o, &e); @@ -6214,20 +6235,18 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o) if (o->emitted_symbols) { if (o->color_moved) { - struct hashmap add_lines, del_lines; - - hashmap_init(&del_lines, moved_entry_cmp, o, 0); - hashmap_init(&add_lines, moved_entry_cmp, o, 0); + struct mem_pool entry_pool; + struct moved_entry_list *entry_list; - add_lines_to_move_detection(o, &add_lines, &del_lines); - mark_color_as_moved(o, &add_lines, &del_lines); + mem_pool_init(&entry_pool, 1024 * 1024); + entry_list = add_lines_to_move_detection(o, + &entry_pool); + mark_color_as_moved(o, entry_list); if (o->color_moved == COLOR_MOVED_ZEBRA_DIM) dim_moved_lines(o); - hashmap_clear_and_free(&add_lines, struct moved_entry, - ent); - hashmap_clear_and_free(&del_lines, struct moved_entry, - ent); + mem_pool_discard(&entry_pool, 0); + free(entry_list); } for (i = 0; i < esm.nr; i++)