diff mbox series

[v3,2/2] blame: enable and test the sparse index

Message ID cfdd33129ec6860cbec0cb20302598429db1115e.1635802069.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Sparse Index: diff and blame builtins | expand

Commit Message

Lessley Dennington Nov. 1, 2021, 9:27 p.m. UTC
From: Lessley Dennington <lessleydennington@gmail.com>

Enable the sparse index for the 'git blame' command. The index was already
not expanded with this command, so the most interesting thing to do is to
add tests that verify that 'git blame' behaves correctly when the sparse
index is enabled and that its performance improves. More specifically, these
cases are:

1. The index is not expanded for 'blame' when given paths in the sparse
checkout cone at multiple levels.

2. Performance measurably improves for 'blame' with sparse index when given
paths in the sparse checkout cone at multiple levels.

The `p2000` tests demonstrate a ~60% execution time reduction when running
'blame' for a file two levels deep and and a ~30% execution time reduction
for a file three levels deep.

Test                                         before  after
----------------------------------------------------------------
2000.62: git blame f2/f4/a (full-v3)         0.31    0.32 +3.2%
2000.63: git blame f2/f4/a (full-v4)         0.29    0.31 +6.9%
2000.64: git blame f2/f4/a (sparse-v3)       0.55    0.23 -58.2%
2000.65: git blame f2/f4/a (sparse-v4)       0.57    0.23 -59.6%
2000.66: git blame f2/f4/f3/a (full-v3)      0.77    0.85 +10.4%
2000.67: git blame f2/f4/f3/a (full-v4)      0.78    0.81 +3.8%
2000.68: git blame f2/f4/f3/a (sparse-v3)    1.07    0.72 -32.7%
2000.99: git blame f2/f4/f3/a (sparse-v4)    1.05    0.73 -30.5%

We do not include paths outside the sparse checkout cone because blame
currently does not support blaming files outside of the sparse definition.
Attempting to do so fails with the following error:

  fatal: no such path '<path outside sparse definition>' in HEAD

Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
---
 builtin/blame.c                          |  3 ++
 t/perf/p2000-sparse-operations.sh        |  2 +
 t/t1092-sparse-checkout-compatibility.sh | 48 ++++++++++++++++++------
 3 files changed, 42 insertions(+), 11 deletions(-)

Comments

Junio C Hamano Nov. 3, 2021, 4:47 p.m. UTC | #1
"Lessley Dennington via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> We do not include paths outside the sparse checkout cone because blame
> currently does not support blaming files outside of the sparse definition.
> Attempting to do so fails with the following error:
>
>   fatal: no such path '<path outside sparse definition>' in HEAD

Does this indicate that we need to update how the command line
safety in verify_working_tree_path() works in a sparsely checked out
working tree?  If foo/bar is outside the sparse definition,

    git blame HEAD foo/bar

may get such a message, but shouldn't

    git blame HEAD -- foo/bar

make it work?

> -# TODO: blame currently does not support blaming files outside of the
> -# sparse definition. It complains that the file doesn't exist locally.
> -test_expect_failure 'blame with pathspec outside sparse definition' '
> +# Blame does not support blaming files outside of the sparse
> +# definition, so we verify this scenario.

IOW, why is it a good idea to drop the "TODO" and "currently" and pretend
as if the current behaviour is the desirable one?

> +test_expect_success 'blame with pathspec outside sparse definition' '
>  	init_repos &&
> +	test_sparse_match git sparse-checkout set &&
>  
> -	test_all_match git blame folder1/a &&
> -	test_all_match git blame folder2/a &&
> -	test_all_match git blame deep/deeper2/a &&
> -	test_all_match git blame deep/deeper2/deepest/a
> +	for file in a \
> +			deep/a \
> +			deep/deeper1/a \
> +			deep/deeper1/deepest/a
> +	do
> +		test_sparse_match test_must_fail git blame $file &&
> +		cat >expect <<-EOF &&
> +		fatal: Cannot lstat '"'"'$file'"'"': No such file or directory
> +		EOF
> +		# We compare sparse-checkout-err and sparse-index-err in
> +		# `test_sparse_match`. Given we know they are the same, we
> +		# only check the content of sparse-index-err here.
> +		test_cmp expect sparse-index-err
> +	done
>  '
>  
>  test_expect_success 'checkout and reset (mixed)' '
> @@ -878,6 +892,18 @@ test_expect_success 'sparse index is not expanded: diff' '
>  	ensure_not_expanded diff --staged
>  '
>  
> +test_expect_success 'sparse index is not expanded: blame' '
> +	init_repos &&
> +
> +	for file in a \
> +			deep/a \
> +			deep/deeper1/a \
> +			deep/deeper1/deepest/a
> +	do
> +		ensure_not_expanded blame $file
> +	done
> +'
> +
>  # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
>  # in this scenario, but it shouldn't.
>  test_expect_success 'reset mixed and checkout orphan' '
Lessley Dennington Nov. 5, 2021, 12:04 a.m. UTC | #2
On 11/3/21 9:47 AM, Junio C Hamano wrote:
> "Lessley Dennington via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
> 
>> We do not include paths outside the sparse checkout cone because blame
>> currently does not support blaming files outside of the sparse definition.
>> Attempting to do so fails with the following error:
>>
>>    fatal: no such path '<path outside sparse definition>' in HEAD
> 
> Does this indicate that we need to update how the command line
> safety in verify_working_tree_path() works in a sparsely checked out
> working tree?  If foo/bar is outside the sparse definition,
> 
>      git blame HEAD foo/bar
> 
> may get such a message, but shouldn't
> 
>      git blame HEAD -- foo/bar
> 
> make it work?
> 
This is something we could consider for the future, and I've updated the 
comment you called out below to better reflect that in the next version. 
However, I don't know that this change belongs in this series, which 
aims to enable the sparse index. Perhaps even updating the 'blame with 
pathspec outside sparse definition' test was a bit too far outside the 
target scope of this change - I can remove my updates if that is the case.

>> -# TODO: blame currently does not support blaming files outside of the
>> -# sparse definition. It complains that the file doesn't exist locally.
>> -test_expect_failure 'blame with pathspec outside sparse definition' '
>> +# Blame does not support blaming files outside of the sparse
>> +# definition, so we verify this scenario.
> 
> IOW, why is it a good idea to drop the "TODO" and "currently" and pretend
> as if the current behaviour is the desirable one?
> 
Thank you, updated for v4.

>> +test_expect_success 'blame with pathspec outside sparse definition' '
>>   	init_repos &&
>> +	test_sparse_match git sparse-checkout set &&
>>   
>> -	test_all_match git blame folder1/a &&
>> -	test_all_match git blame folder2/a &&
>> -	test_all_match git blame deep/deeper2/a &&
>> -	test_all_match git blame deep/deeper2/deepest/a
>> +	for file in a \
>> +			deep/a \
>> +			deep/deeper1/a \
>> +			deep/deeper1/deepest/a
>> +	do
>> +		test_sparse_match test_must_fail git blame $file &&
>> +		cat >expect <<-EOF &&
>> +		fatal: Cannot lstat '"'"'$file'"'"': No such file or directory
>> +		EOF
>> +		# We compare sparse-checkout-err and sparse-index-err in
>> +		# `test_sparse_match`. Given we know they are the same, we
>> +		# only check the content of sparse-index-err here.
>> +		test_cmp expect sparse-index-err
>> +	done
>>   '
>>   
>>   test_expect_success 'checkout and reset (mixed)' '
>> @@ -878,6 +892,18 @@ test_expect_success 'sparse index is not expanded: diff' '
>>   	ensure_not_expanded diff --staged
>>   '
>>   
>> +test_expect_success 'sparse index is not expanded: blame' '
>> +	init_repos &&
>> +
>> +	for file in a \
>> +			deep/a \
>> +			deep/deeper1/a \
>> +			deep/deeper1/deepest/a
>> +	do
>> +		ensure_not_expanded blame $file
>> +	done
>> +'
>> +
>>   # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
>>   # in this scenario, but it shouldn't.
>>   test_expect_success 'reset mixed and checkout orphan' '

Best,

Lessley
Elijah Newren Nov. 21, 2021, 1:46 a.m. UTC | #3
On Wed, Nov 3, 2021 at 9:47 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Lessley Dennington via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > We do not include paths outside the sparse checkout cone because blame
> > currently does not support blaming files outside of the sparse definition.
> > Attempting to do so fails with the following error:
> >
> >   fatal: no such path '<path outside sparse definition>' in HEAD
>
> Does this indicate that we need to update how the command line
> safety in verify_working_tree_path() works in a sparsely checked out
> working tree?

I wondered the same thing, but no I don't think we need any extra
command line safety here.  The behavior Lessley is reporting here is
equivalent to the following in a regular (non-sparse) checkout:

$ rm t/test-lib.sh
$ git blame t/test-lib.sh
fatal: Cannot lstat 't/test-lib.sh': No such file or directory
$ git blame -- t/test-lib.sh
fatal: Cannot lstat 't/test-lib.sh': No such file or directory

blame without a revision has always failed for files not in the
working tree, regardless of whether those files are found in the index
or HEAD.

>  If foo/bar is outside the sparse definition,
>
>     git blame HEAD foo/bar

Actually, that works; there's no error and the code with Lessley's
patch will show the blame info for foo/bar (assuming foo/bar was a
path in HEAD, of course).

> may get such a message, but shouldn't
>
>     git blame HEAD -- foo/bar
>
> make it work?

This also works.  But both of these things are kind of testing
something different; when given a revision, the checkout is irrelevant
to git blame: git blame with a revision will work regardless of
whether the checkout is full, sparse, completely empty, or
non-existent (i.e. a bare clone).

> > -# TODO: blame currently does not support blaming files outside of the
> > -# sparse definition. It complains that the file doesn't exist locally.
> > -test_expect_failure 'blame with pathspec outside sparse definition' '
> > +# Blame does not support blaming files outside of the sparse
> > +# definition, so we verify this scenario.
>
> IOW, why is it a good idea to drop the "TODO" and "currently" and pretend
> as if the current behaviour is the desirable one?

I think dropping the TODO is correct, but the wording is confusing --
it has nothing to do with sparse checkouts.  I'd rather say, "Without
a specified revision, blame will only handle files present in the
current working directory and error on any other paths"

> > +test_expect_success 'blame with pathspec outside sparse definition' '
> >       init_repos &&
> > +     test_sparse_match git sparse-checkout set &&
> >
> > -     test_all_match git blame folder1/a &&
> > -     test_all_match git blame folder2/a &&
> > -     test_all_match git blame deep/deeper2/a &&
> > -     test_all_match git blame deep/deeper2/deepest/a
> > +     for file in a \
> > +                     deep/a \
> > +                     deep/deeper1/a \
> > +                     deep/deeper1/deepest/a
> > +     do
> > +             test_sparse_match test_must_fail git blame $file &&
> > +             cat >expect <<-EOF &&
> > +             fatal: Cannot lstat '"'"'$file'"'"': No such file or directory
> > +             EOF
> > +             # We compare sparse-checkout-err and sparse-index-err in
> > +             # `test_sparse_match`. Given we know they are the same, we
> > +             # only check the content of sparse-index-err here.
> > +             test_cmp expect sparse-index-err
> > +     done
> >  '
> >
> >  test_expect_success 'checkout and reset (mixed)' '
> > @@ -878,6 +892,18 @@ test_expect_success 'sparse index is not expanded: diff' '
> >       ensure_not_expanded diff --staged
> >  '
> >
> > +test_expect_success 'sparse index is not expanded: blame' '
> > +     init_repos &&
> > +
> > +     for file in a \
> > +                     deep/a \
> > +                     deep/deeper1/a \
> > +                     deep/deeper1/deepest/a
> > +     do
> > +             ensure_not_expanded blame $file
> > +     done
> > +'
> > +
> >  # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
> >  # in this scenario, but it shouldn't.
> >  test_expect_success 'reset mixed and checkout orphan' '
diff mbox series

Patch

diff --git a/builtin/blame.c b/builtin/blame.c
index 641523ff9af..af3d81e2bd4 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -902,6 +902,9 @@  int cmd_blame(int argc, const char **argv, const char *prefix)
 	long anchor;
 	const int hexsz = the_hash_algo->hexsz;
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	setup_default_color_by_age();
 	git_config(git_blame_config, &output_option);
 	repo_init_revisions(the_repository, &revs, NULL);
diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
index bff93f16e93..9ac76a049b8 100755
--- a/t/perf/p2000-sparse-operations.sh
+++ b/t/perf/p2000-sparse-operations.sh
@@ -115,5 +115,7 @@  test_perf_on_all git reset --hard
 test_perf_on_all git reset -- does-not-exist
 test_perf_on_all git diff
 test_perf_on_all git diff --staged
+test_perf_on_all git blame $SPARSE_CONE/a
+test_perf_on_all git blame $SPARSE_CONE/f3/a
 
 test_done
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 53524660759..da442f3dcff 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -442,21 +442,35 @@  test_expect_success 'log with pathspec outside sparse definition' '
 test_expect_success 'blame with pathspec inside sparse definition' '
 	init_repos &&
 
-	test_all_match git blame a &&
-	test_all_match git blame deep/a &&
-	test_all_match git blame deep/deeper1/a &&
-	test_all_match git blame deep/deeper1/deepest/a
+	for file in a \
+			deep/a \
+			deep/deeper1/a \
+			deep/deeper1/deepest/a
+	do
+		test_all_match git blame $file
+	done
 '
 
-# TODO: blame currently does not support blaming files outside of the
-# sparse definition. It complains that the file doesn't exist locally.
-test_expect_failure 'blame with pathspec outside sparse definition' '
+# Blame does not support blaming files outside of the sparse
+# definition, so we verify this scenario.
+test_expect_success 'blame with pathspec outside sparse definition' '
 	init_repos &&
+	test_sparse_match git sparse-checkout set &&
 
-	test_all_match git blame folder1/a &&
-	test_all_match git blame folder2/a &&
-	test_all_match git blame deep/deeper2/a &&
-	test_all_match git blame deep/deeper2/deepest/a
+	for file in a \
+			deep/a \
+			deep/deeper1/a \
+			deep/deeper1/deepest/a
+	do
+		test_sparse_match test_must_fail git blame $file &&
+		cat >expect <<-EOF &&
+		fatal: Cannot lstat '"'"'$file'"'"': No such file or directory
+		EOF
+		# We compare sparse-checkout-err and sparse-index-err in
+		# `test_sparse_match`. Given we know they are the same, we
+		# only check the content of sparse-index-err here.
+		test_cmp expect sparse-index-err
+	done
 '
 
 test_expect_success 'checkout and reset (mixed)' '
@@ -878,6 +892,18 @@  test_expect_success 'sparse index is not expanded: diff' '
 	ensure_not_expanded diff --staged
 '
 
+test_expect_success 'sparse index is not expanded: blame' '
+	init_repos &&
+
+	for file in a \
+			deep/a \
+			deep/deeper1/a \
+			deep/deeper1/deepest/a
+	do
+		ensure_not_expanded blame $file
+	done
+'
+
 # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
 # in this scenario, but it shouldn't.
 test_expect_success 'reset mixed and checkout orphan' '