diff mbox series

[v4,4/4] blame: enable and test the sparse index

Message ID 7acf5118bf5602fbafca2d42c4f363b5adbcbd54.1637620958.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Sparse Index: diff and blame builtins | expand

Commit Message

Lessley Dennington Nov. 22, 2021, 10:42 p.m. UTC
From: Lessley Dennington <lessleydennington@gmail.com>

Enable the sparse index for the 'git blame' command. The index was already
not expanded with this command, so the most interesting thing to do is to
add tests that verify that 'git blame' behaves correctly when the sparse
index is enabled and that its performance improves. More specifically, these
cases are:

1. The index is not expanded for 'blame' when given paths in the sparse
checkout cone at multiple levels.

2. Performance measurably improves for 'blame' with sparse index when given
paths in the sparse checkout cone at multiple levels.

The `p2000` tests demonstrate a ~60% execution time reduction when running
'blame' for a file two levels deep and and a ~30% execution time reduction
for a file three levels deep.

Test                                         before  after
----------------------------------------------------------------
2000.62: git blame f2/f4/a (full-v3)         0.31    0.32 +3.2%
2000.63: git blame f2/f4/a (full-v4)         0.29    0.31 +6.9%
2000.64: git blame f2/f4/a (sparse-v3)       0.55    0.23 -58.2%
2000.65: git blame f2/f4/a (sparse-v4)       0.57    0.23 -59.6%
2000.66: git blame f2/f4/f3/a (full-v3)      0.77    0.85 +10.4%
2000.67: git blame f2/f4/f3/a (full-v4)      0.78    0.81 +3.8%
2000.68: git blame f2/f4/f3/a (sparse-v3)    1.07    0.72 -32.7%
2000.99: git blame f2/f4/f3/a (sparse-v4)    1.05    0.73 -30.5%

We do not include paths outside the sparse checkout cone because blame
currently does not support blaming files outside of the sparse definition.
Attempting to do so fails with the following error:

  fatal: no such path '<path outside sparse definition>' in HEAD

Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
---
 builtin/blame.c                          |  5 +++
 t/perf/p2000-sparse-operations.sh        |  4 +-
 t/t1092-sparse-checkout-compatibility.sh | 49 ++++++++++++++++++------
 3 files changed, 46 insertions(+), 12 deletions(-)

Comments

Junio C Hamano Nov. 23, 2021, 11:53 p.m. UTC | #1
"Lessley Dennington via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> diff --git a/builtin/blame.c b/builtin/blame.c
> index 641523ff9af..247b9eaf88f 100644
> --- a/builtin/blame.c
> +++ b/builtin/blame.c
> @@ -902,6 +902,11 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
>  	long anchor;
>  	const int hexsz = the_hash_algo->hexsz;
>  
> +	if (startup_info->have_repository) {

The command is marked with RUN_SETUP bit in git.c::commands[] list,
so I would think we wouldn't even get called if we are not in a
repository here.

Under what condition can startup_info->have_repository be false at
this point in the control flow?  If there is such a case, it would
mean that startup_info->have_repository bit can be false even if we
are in a repository.  That sounds like a bug in some code (I do not
know where offhand) that is supposed to prepare the startup_info
before cmd_X() functions are called.

> diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
> index 5cf94627383..9ac76a049b8 100755
> --- a/t/perf/p2000-sparse-operations.sh
> +++ b/t/perf/p2000-sparse-operations.sh
> @@ -114,6 +114,8 @@ test_perf_on_all git reset
>  test_perf_on_all git reset --hard
>  test_perf_on_all git reset -- does-not-exist
>  test_perf_on_all git diff
> -test_perf_on_all git diff --cached
> +test_perf_on_all git diff --staged

That's a funny revert of what the previous step did; I thought this
step was about "blame" and not "diff".

> +test_perf_on_all git blame $SPARSE_CONE/a
> +test_perf_on_all git blame $SPARSE_CONE/f3/a
>  
>  test_done
> -# TODO: blame currently does not support blaming files outside of the
> -# sparse definition. It complains that the file doesn't exist locally.
> -test_expect_failure 'blame with pathspec outside sparse definition' '
> +# NEEDSWORK: This test documents the current behavior, but this could
> +# change in the future if we decide to support blaming files outside
> +# the sparse definition.

OK.  From the description it is clear that we do not support it
right now, which is OK by me.
Lessley Dennington Nov. 24, 2021, 2:52 p.m. UTC | #2
On 11/23/21 3:53 PM, Junio C Hamano wrote:
> "Lessley Dennington via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
> 
>> diff --git a/builtin/blame.c b/builtin/blame.c
>> index 641523ff9af..247b9eaf88f 100644
>> --- a/builtin/blame.c
>> +++ b/builtin/blame.c
>> @@ -902,6 +902,11 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
>>   	long anchor;
>>   	const int hexsz = the_hash_algo->hexsz;
>>   
>> +	if (startup_info->have_repository) {
> 
> The command is marked with RUN_SETUP bit in git.c::commands[] list,
> so I would think we wouldn't even get called if we are not in a
> repository here.
> 
> Under what condition can startup_info->have_repository be false at
> this point in the control flow?  If there is such a case, it would
> mean that startup_info->have_repository bit can be false even if we
> are in a repository.  That sounds like a bug in some code (I do not
> know where offhand) that is supposed to prepare the startup_info
> before cmd_X() functions are called.
> 
As with checkout and pack-objects, this was added to protect against
the new BUG() in prepare_repo_settings being triggered with the help
command (which was surfaced with failures in t0012-help.sh). As with
those builtins, I will be removing the conditional and moving
prepare_repo_settings after parse_options in v5.
>> diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
>> index 5cf94627383..9ac76a049b8 100755
>> --- a/t/perf/p2000-sparse-operations.sh
>> +++ b/t/perf/p2000-sparse-operations.sh
>> @@ -114,6 +114,8 @@ test_perf_on_all git reset
>>   test_perf_on_all git reset --hard
>>   test_perf_on_all git reset -- does-not-exist
>>   test_perf_on_all git diff
>> -test_perf_on_all git diff --cached
>> +test_perf_on_all git diff --staged
> 
> That's a funny revert of what the previous step did; I thought this
> step was about "blame" and not "diff".
> 
>> +test_perf_on_all git blame $SPARSE_CONE/a
>> +test_perf_on_all git blame $SPARSE_CONE/f3/a
>>   
>>   test_done
>> -# TODO: blame currently does not support blaming files outside of the
>> -# sparse definition. It complains that the file doesn't exist locally.
>> -test_expect_failure 'blame with pathspec outside sparse definition' '
>> +# NEEDSWORK: This test documents the current behavior, but this could
>> +# change in the future if we decide to support blaming files outside
>> +# the sparse definition.
> 
> OK.  From the description it is clear that we do not support it
> right now, which is OK by me.
> 
Elijah made the point that blaming files outside the directory isn't
supported in full checkouts either. So I've updated the comment and
commit description a bit for v5.

Lessley
diff mbox series

Patch

diff --git a/builtin/blame.c b/builtin/blame.c
index 641523ff9af..247b9eaf88f 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -902,6 +902,11 @@  int cmd_blame(int argc, const char **argv, const char *prefix)
 	long anchor;
 	const int hexsz = the_hash_algo->hexsz;
 
+	if (startup_info->have_repository) {
+		prepare_repo_settings(the_repository);
+		the_repository->settings.command_requires_full_index = 0;
+	}
+
 	setup_default_color_by_age();
 	git_config(git_blame_config, &output_option);
 	repo_init_revisions(the_repository, &revs, NULL);
diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
index 5cf94627383..9ac76a049b8 100755
--- a/t/perf/p2000-sparse-operations.sh
+++ b/t/perf/p2000-sparse-operations.sh
@@ -114,6 +114,8 @@  test_perf_on_all git reset
 test_perf_on_all git reset --hard
 test_perf_on_all git reset -- does-not-exist
 test_perf_on_all git diff
-test_perf_on_all git diff --cached
+test_perf_on_all git diff --staged
+test_perf_on_all git blame $SPARSE_CONE/a
+test_perf_on_all git blame $SPARSE_CONE/f3/a
 
 test_done
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 53524660759..d680dbd8867 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -442,21 +442,36 @@  test_expect_success 'log with pathspec outside sparse definition' '
 test_expect_success 'blame with pathspec inside sparse definition' '
 	init_repos &&
 
-	test_all_match git blame a &&
-	test_all_match git blame deep/a &&
-	test_all_match git blame deep/deeper1/a &&
-	test_all_match git blame deep/deeper1/deepest/a
+	for file in a \
+			deep/a \
+			deep/deeper1/a \
+			deep/deeper1/deepest/a
+	do
+		test_all_match git blame $file
+	done
 '
 
-# TODO: blame currently does not support blaming files outside of the
-# sparse definition. It complains that the file doesn't exist locally.
-test_expect_failure 'blame with pathspec outside sparse definition' '
+# NEEDSWORK: This test documents the current behavior, but this could
+# change in the future if we decide to support blaming files outside
+# the sparse definition.
+test_expect_success 'blame with pathspec outside sparse definition' '
 	init_repos &&
+	test_sparse_match git sparse-checkout set &&
 
-	test_all_match git blame folder1/a &&
-	test_all_match git blame folder2/a &&
-	test_all_match git blame deep/deeper2/a &&
-	test_all_match git blame deep/deeper2/deepest/a
+	for file in a \
+			deep/a \
+			deep/deeper1/a \
+			deep/deeper1/deepest/a
+	do
+		test_sparse_match test_must_fail git blame $file &&
+		cat >expect <<-EOF &&
+		fatal: Cannot lstat '"'"'$file'"'"': No such file or directory
+		EOF
+		# We compare sparse-checkout-err and sparse-index-err in
+		# `test_sparse_match`. Given we know they are the same, we
+		# only check the content of sparse-index-err here.
+		test_cmp expect sparse-index-err
+	done
 '
 
 test_expect_success 'checkout and reset (mixed)' '
@@ -878,6 +893,18 @@  test_expect_success 'sparse index is not expanded: diff' '
 	ensure_not_expanded diff --staged
 '
 
+test_expect_success 'sparse index is not expanded: blame' '
+	init_repos &&
+
+	for file in a \
+			deep/a \
+			deep/deeper1/a \
+			deep/deeper1/deepest/a
+	do
+		ensure_not_expanded blame $file
+	done
+'
+
 # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
 # in this scenario, but it shouldn't.
 test_expect_success 'reset mixed and checkout orphan' '