diff mbox series

[v2,2/2] blame: enable and test the sparse index

Message ID a0b6a152c754862323e9a5b89ad43ab34b6548f7.1634332836.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Sparse Index: diff and blame builtins | expand

Commit Message

Lessley Dennington Oct. 15, 2021, 9:20 p.m. UTC
From: Lessley Dennington <lessleydennington@gmail.com>

Enable the sparse index for the 'git blame' command. The index was already
not expanded with this command, so the most interesting thing to do is to
add tests that verify that 'git blame' behaves correctly when the sparse
index is enabled and that its performance improves. More specifically, these
cases are:

1. The index is not expanded for 'blame' when given paths in the sparse
checkout cone at multiple levels.

2. Performance measurably improves for 'blame' with sparse index when given
paths in the sparse checkout cone at multiple levels.

The `p2000` tests demonstrate a ~60% execution time reduction when running
'blame' for a file two levels deep and and a ~30% execution time reduction
for a file three levels deep.

Test                                         before  after
----------------------------------------------------------------
2000.62: git blame f2/f4/a (full-v3)         0.31    0.32 +3.2%
2000.63: git blame f2/f4/a (full-v4)         0.29    0.31 +6.9%
2000.64: git blame f2/f4/a (sparse-v3)       0.55    0.23 -58.2%
2000.65: git blame f2/f4/a (sparse-v4)       0.57    0.23 -59.6%
2000.66: git blame f2/f4/f3/a (full-v3)      0.77    0.85 +10.4%
2000.67: git blame f2/f4/f3/a (full-v4)      0.78    0.81 +3.8%
2000.68: git blame f2/f4/f3/a (sparse-v3)    1.07    0.72 -32.7%
2000.99: git blame f2/f4/f3/a (sparse-v4)    1.05    0.73 -30.5%

We do not include paths outside the sparse checkout cone because blame
currently does not support blaming files outside of the sparse definition.
Attempting to do so fails with the following error:

fatal: no such path '<path outside sparse definition>' in HEAD

Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
---
 builtin/blame.c                          |  3 +++
 t/perf/p2000-sparse-operations.sh        |  2 ++
 t/t1092-sparse-checkout-compatibility.sh | 24 +++++++++++++++++-------
 3 files changed, 22 insertions(+), 7 deletions(-)

Comments

Taylor Blau Oct. 25, 2021, 8:53 p.m. UTC | #1
On Fri, Oct 15, 2021 at 09:20:35PM +0000, Lessley Dennington via GitGitGadget wrote:
> From: Lessley Dennington <lessleydennington@gmail.com>
>
> Enable the sparse index for the 'git blame' command. The index was already
> not expanded with this command, so the most interesting thing to do is to
> add tests that verify that 'git blame' behaves correctly when the sparse
> index is enabled and that its performance improves. More specifically, these
> cases are:
>
> 1. The index is not expanded for 'blame' when given paths in the sparse
> checkout cone at multiple levels.
>
> 2. Performance measurably improves for 'blame' with sparse index when given
> paths in the sparse checkout cone at multiple levels.
>
> The `p2000` tests demonstrate a ~60% execution time reduction when running
> 'blame' for a file two levels deep and and a ~30% execution time reduction
> for a file three levels deep.

Eek. What's eating up the other 30% when we have to open up another
layer of trees?

>
> Test                                         before  after
> ----------------------------------------------------------------
> 2000.62: git blame f2/f4/a (full-v3)         0.31    0.32 +3.2%
> 2000.63: git blame f2/f4/a (full-v4)         0.29    0.31 +6.9%
> 2000.64: git blame f2/f4/a (sparse-v3)       0.55    0.23 -58.2%
> 2000.65: git blame f2/f4/a (sparse-v4)       0.57    0.23 -59.6%
> 2000.66: git blame f2/f4/f3/a (full-v3)      0.77    0.85 +10.4%
> 2000.67: git blame f2/f4/f3/a (full-v4)      0.78    0.81 +3.8%
> 2000.68: git blame f2/f4/f3/a (sparse-v3)    1.07    0.72 -32.7%
> 2000.99: git blame f2/f4/f3/a (sparse-v4)    1.05    0.73 -30.5%
>
> We do not include paths outside the sparse checkout cone because blame
> currently does not support blaming files outside of the sparse definition.
> Attempting to do so fails with the following error:
>
> fatal: no such path '<path outside sparse definition>' in HEAD.

Small nit; this error message should be indented with a couple of space
characters to indicate that it's the output of running Git instead of
part of your patch message. Not worth a reroll on its own, but something
to keep in mind for your many future patches :).

>
> Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
> ---
>  builtin/blame.c                          |  3 +++
>  t/perf/p2000-sparse-operations.sh        |  2 ++
>  t/t1092-sparse-checkout-compatibility.sh | 24 +++++++++++++++++-------
>  3 files changed, 22 insertions(+), 7 deletions(-)
>
> diff --git a/builtin/blame.c b/builtin/blame.c
> index 641523ff9af..af3d81e2bd4 100644
> --- a/builtin/blame.c
> +++ b/builtin/blame.c
> @@ -902,6 +902,9 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
>  	long anchor;
>  	const int hexsz = the_hash_algo->hexsz;
>
> +	prepare_repo_settings(the_repository);
> +	the_repository->settings.command_requires_full_index = 0;
> +

By now we're quite used to seeing this ;). Makes sense to me.

>  	setup_default_color_by_age();
>  	git_config(git_blame_config, &output_option);
>  	repo_init_revisions(the_repository, &revs, NULL);
> diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
> index bff93f16e93..9ac76a049b8 100755
> --- a/t/perf/p2000-sparse-operations.sh
> +++ b/t/perf/p2000-sparse-operations.sh
> @@ -115,5 +115,7 @@ test_perf_on_all git reset --hard
>  test_perf_on_all git reset -- does-not-exist
>  test_perf_on_all git diff
>  test_perf_on_all git diff --staged
> +test_perf_on_all git blame $SPARSE_CONE/a
> +test_perf_on_all git blame $SPARSE_CONE/f3/a

Good.

>  test_done
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index e5d15be9d45..960ccf2d150 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -488,15 +488,16 @@ test_expect_success 'blame with pathspec inside sparse definition' '
>  	test_all_match git blame deep/deeper1/deepest/a
>  '
>
> -# TODO: blame currently does not support blaming files outside of the
> -# sparse definition. It complains that the file doesn't exist locally.
> -test_expect_failure 'blame with pathspec outside sparse definition' '
> +# Blame does not support blaming files outside of the sparse
> +# definition, so we verify this scenario.
> +test_expect_success 'blame with pathspec outside sparse definition' '
>  	init_repos &&
>
> -	test_all_match git blame folder1/a &&
> -	test_all_match git blame folder2/a &&
> -	test_all_match git blame deep/deeper2/a &&
> -	test_all_match git blame deep/deeper2/deepest/a
> +	test_sparse_match git sparse-checkout set &&
> +	test_sparse_match test_must_fail git blame folder1/a &&
> +	test_sparse_match test_must_fail git blame folder2/a &&
> +	test_sparse_match test_must_fail git blame deep/deeper2/a &&
> +	test_sparse_match test_must_fail git blame deep/deeper2/deepest/a
>  '

test_must_fail used to allow for segfaults, but doesn't these days. So
this is a good test of "it should fail in sparse checkouts but not
crash", although I think it would be good to ensure that it's failing in
the way you expect (i.e., by checking that stderr contains "no such path
<xyz> in HEAD").
>
>  test_expect_success 'checkout and reset (mixed)' '
> @@ -874,6 +875,15 @@ test_expect_success 'sparse-index is not expanded: merge conflict in cone' '
>  	)
>  '
>
> +test_expect_success 'sparse index is not expanded: blame' '
> +	init_repos &&
> +
> +	ensure_not_expanded blame a &&
> +	ensure_not_expanded blame deep/a &&
> +	ensure_not_expanded blame deep/deeper1/a &&
> +	ensure_not_expanded blame deep/deeper1/deepest/a
> +'

Makes sense. Probably just one of these is necessary, but I haven't
looked into init_repos (or the "setup" test) enough to know for sure.
Either way, not worth changing.

Thanks,
Taylor
Lessley Dennington Oct. 26, 2021, 4:17 p.m. UTC | #2
On 10/25/21 1:53 PM, Taylor Blau wrote:
> On Fri, Oct 15, 2021 at 09:20:35PM +0000, Lessley Dennington via GitGitGadget wrote:
>> From: Lessley Dennington <lessleydennington@gmail.com>
>>
>> Enable the sparse index for the 'git blame' command. The index was already
>> not expanded with this command, so the most interesting thing to do is to
>> add tests that verify that 'git blame' behaves correctly when the sparse
>> index is enabled and that its performance improves. More specifically, these
>> cases are:
>>
>> 1. The index is not expanded for 'blame' when given paths in the sparse
>> checkout cone at multiple levels.
>>
>> 2. Performance measurably improves for 'blame' with sparse index when given
>> paths in the sparse checkout cone at multiple levels.
>>
>> The `p2000` tests demonstrate a ~60% execution time reduction when running
>> 'blame' for a file two levels deep and and a ~30% execution time reduction
>> for a file three levels deep.
> 
> Eek. What's eating up the other 30% when we have to open up another
> layer of trees?
> 
I'm not sure to be totally honest. However, given these are both pretty 
good time reductions I don't think we should be terribly concerned.
>>
>> Test                                         before  after
>> ----------------------------------------------------------------
>> 2000.62: git blame f2/f4/a (full-v3)         0.31    0.32 +3.2%
>> 2000.63: git blame f2/f4/a (full-v4)         0.29    0.31 +6.9%
>> 2000.64: git blame f2/f4/a (sparse-v3)       0.55    0.23 -58.2%
>> 2000.65: git blame f2/f4/a (sparse-v4)       0.57    0.23 -59.6%
>> 2000.66: git blame f2/f4/f3/a (full-v3)      0.77    0.85 +10.4%
>> 2000.67: git blame f2/f4/f3/a (full-v4)      0.78    0.81 +3.8%
>> 2000.68: git blame f2/f4/f3/a (sparse-v3)    1.07    0.72 -32.7%
>> 2000.99: git blame f2/f4/f3/a (sparse-v4)    1.05    0.73 -30.5%
>>
>> We do not include paths outside the sparse checkout cone because blame
>> currently does not support blaming files outside of the sparse definition.
>> Attempting to do so fails with the following error:
>>
>> fatal: no such path '<path outside sparse definition>' in HEAD.
> 
> Small nit; this error message should be indented with a couple of space
> characters to indicate that it's the output of running Git instead of
> part of your patch message. Not worth a reroll on its own, but something
> to keep in mind for your many future patches :).
> 
Eh, I'm making some changes based on your suggestions anyway, so I'm 
including this in v3. Thanks for letting me know!
>>
>> Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
>> ---
>>   builtin/blame.c                          |  3 +++
>>   t/perf/p2000-sparse-operations.sh        |  2 ++
>>   t/t1092-sparse-checkout-compatibility.sh | 24 +++++++++++++++++-------
>>   3 files changed, 22 insertions(+), 7 deletions(-)
>>
>> diff --git a/builtin/blame.c b/builtin/blame.c
>> index 641523ff9af..af3d81e2bd4 100644
>> --- a/builtin/blame.c
>> +++ b/builtin/blame.c
>> @@ -902,6 +902,9 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
>>   	long anchor;
>>   	const int hexsz = the_hash_algo->hexsz;
>>
>> +	prepare_repo_settings(the_repository);
>> +	the_repository->settings.command_requires_full_index = 0;
>> +
> 
> By now we're quite used to seeing this ;). Makes sense to me.
> 
>>   	setup_default_color_by_age();
>>   	git_config(git_blame_config, &output_option);
>>   	repo_init_revisions(the_repository, &revs, NULL);
>> diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
>> index bff93f16e93..9ac76a049b8 100755
>> --- a/t/perf/p2000-sparse-operations.sh
>> +++ b/t/perf/p2000-sparse-operations.sh
>> @@ -115,5 +115,7 @@ test_perf_on_all git reset --hard
>>   test_perf_on_all git reset -- does-not-exist
>>   test_perf_on_all git diff
>>   test_perf_on_all git diff --staged
>> +test_perf_on_all git blame $SPARSE_CONE/a
>> +test_perf_on_all git blame $SPARSE_CONE/f3/a
> 
> Good.
> 
>>   test_done
>> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
>> index e5d15be9d45..960ccf2d150 100755
>> --- a/t/t1092-sparse-checkout-compatibility.sh
>> +++ b/t/t1092-sparse-checkout-compatibility.sh
>> @@ -488,15 +488,16 @@ test_expect_success 'blame with pathspec inside sparse definition' '
>>   	test_all_match git blame deep/deeper1/deepest/a
>>   '
>>
>> -# TODO: blame currently does not support blaming files outside of the
>> -# sparse definition. It complains that the file doesn't exist locally.
>> -test_expect_failure 'blame with pathspec outside sparse definition' '
>> +# Blame does not support blaming files outside of the sparse
>> +# definition, so we verify this scenario.
>> +test_expect_success 'blame with pathspec outside sparse definition' '
>>   	init_repos &&
>>
>> -	test_all_match git blame folder1/a &&
>> -	test_all_match git blame folder2/a &&
>> -	test_all_match git blame deep/deeper2/a &&
>> -	test_all_match git blame deep/deeper2/deepest/a
>> +	test_sparse_match git sparse-checkout set &&
>> +	test_sparse_match test_must_fail git blame folder1/a &&
>> +	test_sparse_match test_must_fail git blame folder2/a &&
>> +	test_sparse_match test_must_fail git blame deep/deeper2/a &&
>> +	test_sparse_match test_must_fail git blame deep/deeper2/deepest/a
>>   '
> 
> test_must_fail used to allow for segfaults, but doesn't these days. So
> this is a good test of "it should fail in sparse checkouts but not
> crash", although I think it would be good to ensure that it's failing in
> the way you expect (i.e., by checking that stderr contains "no such path
> <xyz> in HEAD").
Good suggestion, coming in v3!
>>
>>   test_expect_success 'checkout and reset (mixed)' '
>> @@ -874,6 +875,15 @@ test_expect_success 'sparse-index is not expanded: merge conflict in cone' '
>>   	)
>>   '
>>
>> +test_expect_success 'sparse index is not expanded: blame' '
>> +	init_repos &&
>> +
>> +	ensure_not_expanded blame a &&
>> +	ensure_not_expanded blame deep/a &&
>> +	ensure_not_expanded blame deep/deeper1/a &&
>> +	ensure_not_expanded blame deep/deeper1/deepest/a
>> +'
> 
> Makes sense. Probably just one of these is necessary, but I haven't
> looked into init_repos (or the "setup" test) enough to know for sure.
> Either way, not worth changing.
> 
> Thanks,
> Taylor
>
Elijah Newren Nov. 21, 2021, 1:32 a.m. UTC | #3
On Tue, Oct 26, 2021 at 9:17 AM Lessley Dennington
<lessleydennington@gmail.com> wrote:
>
> On 10/25/21 1:53 PM, Taylor Blau wrote:
> > On Fri, Oct 15, 2021 at 09:20:35PM +0000, Lessley Dennington via GitGitGadget wrote:
> >> From: Lessley Dennington <lessleydennington@gmail.com>
> >>
> >> Enable the sparse index for the 'git blame' command. The index was already
> >> not expanded with this command, so the most interesting thing to do is to
> >> add tests that verify that 'git blame' behaves correctly when the sparse
> >> index is enabled and that its performance improves. More specifically, these
> >> cases are:
> >>
> >> 1. The index is not expanded for 'blame' when given paths in the sparse
> >> checkout cone at multiple levels.
> >>
> >> 2. Performance measurably improves for 'blame' with sparse index when given
> >> paths in the sparse checkout cone at multiple levels.
> >>
> >> The `p2000` tests demonstrate a ~60% execution time reduction when running
> >> 'blame' for a file two levels deep and and a ~30% execution time reduction
> >> for a file three levels deep.
> >
> > Eek. What's eating up the other 30% when we have to open up another
> > layer of trees?
> >
> I'm not sure to be totally honest. However, given these are both pretty
> good time reductions I don't think we should be terribly concerned.

It's not something eating up more time in the sparse-index code; let's
look a bit closer...

> >>
> >> Test                                         before  after
> >> ----------------------------------------------------------------
> >> 2000.62: git blame f2/f4/a (full-v3)         0.31    0.32 +3.2%
> >> 2000.63: git blame f2/f4/a (full-v4)         0.29    0.31 +6.9%
> >> 2000.64: git blame f2/f4/a (sparse-v3)       0.55    0.23 -58.2%
> >> 2000.65: git blame f2/f4/a (sparse-v4)       0.57    0.23 -59.6%
> >> 2000.66: git blame f2/f4/f3/a (full-v3)      0.77    0.85 +10.4%
> >> 2000.67: git blame f2/f4/f3/a (full-v4)      0.78    0.81 +3.8%
> >> 2000.68: git blame f2/f4/f3/a (sparse-v3)    1.07    0.72 -32.7%
> >> 2000.99: git blame f2/f4/f3/a (sparse-v4)    1.05    0.73 -30.5%

Time was ~0.55s for the full at two levels deep, and dropped by just
over 0.3s in sparse-index.
Time was ~1.05s for the full at three levels deep, and dropped by just
over 0.3s in sparse-index.

So, the sparse-index enabling saves us the same amount of time, it's
just that the overall execution time for the non-sparse-index
comparison point goes up.  Saving the same amount of time for the two
cases seems intuitive to me; both cases get to avoid looking at the
same number of index entries outside the sparsity paths.
diff mbox series

Patch

diff --git a/builtin/blame.c b/builtin/blame.c
index 641523ff9af..af3d81e2bd4 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -902,6 +902,9 @@  int cmd_blame(int argc, const char **argv, const char *prefix)
 	long anchor;
 	const int hexsz = the_hash_algo->hexsz;
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	setup_default_color_by_age();
 	git_config(git_blame_config, &output_option);
 	repo_init_revisions(the_repository, &revs, NULL);
diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
index bff93f16e93..9ac76a049b8 100755
--- a/t/perf/p2000-sparse-operations.sh
+++ b/t/perf/p2000-sparse-operations.sh
@@ -115,5 +115,7 @@  test_perf_on_all git reset --hard
 test_perf_on_all git reset -- does-not-exist
 test_perf_on_all git diff
 test_perf_on_all git diff --staged
+test_perf_on_all git blame $SPARSE_CONE/a
+test_perf_on_all git blame $SPARSE_CONE/f3/a
 
 test_done
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index e5d15be9d45..960ccf2d150 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -488,15 +488,16 @@  test_expect_success 'blame with pathspec inside sparse definition' '
 	test_all_match git blame deep/deeper1/deepest/a
 '
 
-# TODO: blame currently does not support blaming files outside of the
-# sparse definition. It complains that the file doesn't exist locally.
-test_expect_failure 'blame with pathspec outside sparse definition' '
+# Blame does not support blaming files outside of the sparse
+# definition, so we verify this scenario.
+test_expect_success 'blame with pathspec outside sparse definition' '
 	init_repos &&
 
-	test_all_match git blame folder1/a &&
-	test_all_match git blame folder2/a &&
-	test_all_match git blame deep/deeper2/a &&
-	test_all_match git blame deep/deeper2/deepest/a
+	test_sparse_match git sparse-checkout set &&
+	test_sparse_match test_must_fail git blame folder1/a &&
+	test_sparse_match test_must_fail git blame folder2/a &&
+	test_sparse_match test_must_fail git blame deep/deeper2/a &&
+	test_sparse_match test_must_fail git blame deep/deeper2/deepest/a
 '
 
 test_expect_success 'checkout and reset (mixed)' '
@@ -874,6 +875,15 @@  test_expect_success 'sparse-index is not expanded: merge conflict in cone' '
 	)
 '
 
+test_expect_success 'sparse index is not expanded: blame' '
+	init_repos &&
+
+	ensure_not_expanded blame a &&
+	ensure_not_expanded blame deep/a &&
+	ensure_not_expanded blame deep/deeper1/a &&
+	ensure_not_expanded blame deep/deeper1/deepest/a
+'
+
 # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
 # in this scenario, but it shouldn't.
 test_expect_success 'reset mixed and checkout orphan' '