[v1,7/9] diff --color-moved-ws: optimize allow-indentation-change
diff mbox series

Message ID 20181116110356.12311-8-phillip.wood@talktalk.net
State New
Headers show
Series
  • diff --color-moved-ws fixes and enhancment
Related show

Commit Message

Phillip Wood Nov. 16, 2018, 11:03 a.m. UTC
From: Phillip Wood <phillip.wood@dunelm.org.uk>

When running

  git diff --color-moved-ws=allow-indentation-change v2.18.0 v2.19.0

cmp_in_block_with_wsd() is called 694908327 times. Of those 42.7%
return after comparing a and b. By comparing the lengths first we can
return early in all but 0.03% of those cases without dereferencing the
string pointers. The comparison between a and c fails in 6.8% of
calls, by comparing the lengths first we reject all the failing calls
without dereferencing the string pointers.

This reduces the time to run the command above by by 42% from 14.6s to
8.5s. This is still much slower than the normal --color-moved which
takes ~0.6-0.7s to run but is a significant improvement.

The next commits will replace the current implementation with one that
works with mixed tabs and spaces in the indentation. I think it is
worth optimizing the current implementation first to enable a fair
comparison between the two implementations.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

Comments

Stefan Beller Nov. 16, 2018, 8:40 p.m. UTC | #1
On Fri, Nov 16, 2018 at 3:04 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>
> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> When running
>
>   git diff --color-moved-ws=allow-indentation-change v2.18.0 v2.19.0
>
> cmp_in_block_with_wsd() is called 694908327 times. Of those 42.7%
> return after comparing a and b. By comparing the lengths first we can
> return early in all but 0.03% of those cases without dereferencing the
> string pointers. The comparison between a and c fails in 6.8% of
> calls, by comparing the lengths first we reject all the failing calls
> without dereferencing the string pointers.
>
> This reduces the time to run the command above by by 42% from 14.6s to
> 8.5s. This is still much slower than the normal --color-moved which
> takes ~0.6-0.7s to run but is a significant improvement.
>
> The next commits will replace the current implementation with one that
> works with mixed tabs and spaces in the indentation. I think it is
> worth optimizing the current implementation first to enable a fair
> comparison between the two implementations.

Up to here the series looks good and I think we could take it
as a preparatory self-standing series.

I'll read on.
Thanks,
Stefan
Phillip Wood Nov. 17, 2018, 2:52 p.m. UTC | #2
On 16/11/2018 20:40, Stefan Beller wrote:
> On Fri, Nov 16, 2018 at 3:04 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>>
>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>
>> When running
>>
>>    git diff --color-moved-ws=allow-indentation-change v2.18.0 v2.19.0
>>
>> cmp_in_block_with_wsd() is called 694908327 times. Of those 42.7%
>> return after comparing a and b. By comparing the lengths first we can
>> return early in all but 0.03% of those cases without dereferencing the
>> string pointers. The comparison between a and c fails in 6.8% of
>> calls, by comparing the lengths first we reject all the failing calls
>> without dereferencing the string pointers.
>>
>> This reduces the time to run the command above by by 42% from 14.6s to
>> 8.5s. This is still much slower than the normal --color-moved which
>> takes ~0.6-0.7s to run but is a significant improvement.
>>
>> The next commits will replace the current implementation with one that
>> works with mixed tabs and spaces in the indentation. I think it is
>> worth optimizing the current implementation first to enable a fair
>> comparison between the two implementations.
> 
> Up to here the series looks good and I think we could take it
> as a preparatory self-standing series.

Thanks for looking at these, I think it makes sense to split the series 
here, the commit message for this patch may want tweaking slightly if we 
do. (I did wonder about splitting it in two when I submitted it but took 
the easy way out.)

Best Wishes

Phillip
> 
> I'll read on.
> Thanks,
> Stefan
>

Patch
diff mbox series

diff --git a/diff.c b/diff.c
index 8c08dd68df..c378ce3daf 100644
--- a/diff.c
+++ b/diff.c
@@ -829,20 +829,23 @@  static int cmp_in_block_with_wsd(const struct diff_options *o,
 				 int n)
 {
 	struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
-	int al = cur->es->len, cl = l->len;
+	int al = cur->es->len, bl = match->es->len, cl = l->len;
 	const char *a = cur->es->line,
 		   *b = match->es->line,
 		   *c = l->line;
-
+	const char *orig_a = a;
 	int wslen;
 
 	/*
-	 * We need to check if 'cur' is equal to 'match'.
-	 * As those are from the same (+/-) side, we do not need to adjust for
-	 * indent changes. However these were found using fuzzy matching
-	 * so we do have to check if they are equal.
+	 * We need to check if 'cur' is equal to 'match'.  As those
+	 * are from the same (+/-) side, we do not need to adjust for
+	 * indent changes. However these were found using fuzzy
+	 * matching so we do have to check if they are equal. Here we
+	 * just check the lengths. We delay calling memcmp() to check
+	 * the contents until later as if the length comparison for a
+	 * and c fails we can avoid the call all together.
 	 */
-	if (strcmp(a, b))
+	if (al != bl)
 		return 1;
 
 	if (!pmb->wsd.string)
@@ -870,7 +873,7 @@  static int cmp_in_block_with_wsd(const struct diff_options *o,
 		al -= wslen;
 	}
 
-	if (al != cl || memcmp(a, c, al))
+	if (al != cl || memcmp(orig_a, b, bl) || memcmp(a, c, al))
 		return 1;
 
 	return 0;