diff mbox series

[v2,1/2] diff: enable and test the sparse index

Message ID ac33159d020cc0c0f6fbee36eb74fff773cb8f9f.1634332836.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Sparse Index: diff and blame builtins | expand

Commit Message

Lessley Dennington Oct. 15, 2021, 9:20 p.m. UTC
From: Lessley Dennington <lessleydennington@gmail.com>

Enable the sparse index within the 'git diff' command. Its implementation
already safely integrates with the sparse index because it shares code with
the 'git status' and 'git checkout' commands that were already integrated.
The most interesting thing to do is to add tests that verify that 'git diff'
behaves correctly when the sparse index is enabled. These cases are:

1. The index is not expanded for 'diff' and 'diff --staged'
2. 'diff' and 'diff --staged' behave the same in full checkout, sparse
checkout, and sparse index repositories in the following partially-staged
scenarios (i.e. the index, HEAD, and working directory differ at a given
path):
    1. Path is within sparse-checkout cone
    2. Path is outside sparse-checkout cone
    3. A merge conflict exists for paths outside sparse-checkout cone

The `p2000` tests demonstrate a ~30% execution time reduction for 'git
diff' and a ~75% execution time reduction for 'git diff --staged' using a
sparse index:

Test                                      before  after
-------------------------------------------------------------
2000.30: git diff (full-v3)               0.37    0.36 -2.7%
2000.31: git diff (full-v4)               0.36    0.35 -2.8%
2000.32: git diff (sparse-v3)             0.46    0.30 -34.8%
2000.33: git diff (sparse-v4)             0.43    0.31 -27.9%
2000.34: git diff --staged (full-v3)      0.08    0.08 +0.0%
2000.35: git diff --staged (full-v4)      0.08    0.08 +0.0%
2000.36: git diff --staged (sparse-v3)    0.17    0.04 -76.5%
2000.37: git diff --staged (sparse-v4)    0.16    0.04 -75.0%

Co-authored-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
---
 builtin/diff.c                           |  3 ++
 t/perf/p2000-sparse-operations.sh        |  2 ++
 t/t1092-sparse-checkout-compatibility.sh | 45 ++++++++++++++++++++++++
 3 files changed, 50 insertions(+)

Comments

Taylor Blau Oct. 25, 2021, 8:47 p.m. UTC | #1
On Fri, Oct 15, 2021 at 09:20:34PM +0000, Lessley Dennington via GitGitGadget wrote:
> From: Lessley Dennington <lessleydennington@gmail.com>
>
> Enable the sparse index within the 'git diff' command. Its implementation
> already safely integrates with the sparse index because it shares code with
> the 'git status' and 'git checkout' commands that were already integrated.

Good, it looks like most of the heavy-lifting to make `git diff` work
with the sparse index was already done elsewhere.

It may be helpful here to include either one of two things to help
readers and reviewers understand what's going on:

  - A summary of what `git status` and/or `git checkout` does to work
    with the sparse index.
  - Or the patches which make those commands work with the sparse index
    so that readers can refer back to them.

Having either of those would help readers who are unfamiliar with
builtin/diff.c convince themselves more easily that setting
'command_requires_full_index = 0' is all that's needed here.

> The most interesting thing to do is to add tests that verify that 'git diff'
> behaves correctly when the sparse index is enabled. These cases are:
>
> 1. The index is not expanded for 'diff' and 'diff --staged'
> 2. 'diff' and 'diff --staged' behave the same in full checkout, sparse
> checkout, and sparse index repositories in the following partially-staged
> scenarios (i.e. the index, HEAD, and working directory differ at a given
> path):
>     1. Path is within sparse-checkout cone
>     2. Path is outside sparse-checkout cone
>     3. A merge conflict exists for paths outside sparse-checkout cone

Nice, these are all of the test cases that I would expect to demonstrate
interesting behavior.

> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index f19c1b3e2eb..e5d15be9d45 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -386,6 +386,46 @@ test_expect_success 'diff --staged' '
>  	test_all_match git diff --staged
>  '
>
> +test_expect_success 'diff partially-staged' '
> +	init_repos &&
> +
> +	write_script edit-contents <<-\EOF &&
> +	echo text >>$1
> +	EOF
> +
> +	# Add file within cone
> +	test_sparse_match git sparse-checkout set deep &&
> +	run_on_all ../edit-contents deep/testfile &&
> +	test_all_match git add deep/testfile &&
> +	run_on_all ../edit-contents deep/testfile &&
> +
> +	test_all_match git diff &&
> +	test_all_match git diff --staged &&
> +
> +	# Add file outside cone
> +	test_all_match git reset --hard &&
> +	run_on_all mkdir newdirectory &&
> +	run_on_all ../edit-contents newdirectory/testfile &&
> +	test_sparse_match git sparse-checkout set newdirectory &&
> +	test_all_match git add newdirectory/testfile &&
> +	run_on_all ../edit-contents newdirectory/testfile &&
> +	test_sparse_match git sparse-checkout set &&
> +
> +	test_all_match git diff &&
> +	test_all_match git diff --staged &&
> +
> +	# Merge conflict outside cone
> +	# The sparse checkout will report a warning that is not in the
> +	# full checkout, so we use `run_on_all` instead of
> +	# `test_all_match`
> +	run_on_all git reset --hard &&
> +	test_all_match git checkout merge-left &&
> +	test_all_match test_must_fail git merge merge-right &&
> +
> +	test_all_match git diff &&
> +	test_all_match git diff --staged
> +'
> +
>  # NEEDSWORK: sparse-checkout behaves differently from full-checkout when
>  # running this test with 'df-conflict-2' after 'df-conflict-1'.
>  test_expect_success 'diff with renames and conflicts' '
> @@ -800,6 +840,11 @@ test_expect_success 'sparse-index is not expanded' '
>  	# Wildcard identifies only full sparse directories, no index expansion
>  	ensure_not_expanded reset deepest -- folder\* &&
>
> +	echo a test change >>sparse-index/README.md &&
> +	ensure_not_expanded diff &&

Thinking aloud here as somebody who is unfamiliar with the sparse-index
tests. ensure_not_expanded relies on the existence of the "sparse-index"
repository, and its top-level README.md is outside of the
sparse-checkout cone.

That makes sense, and when I create a repository with a file outside of
the sparse-checkout cone and then run `git diff`, I see no changes as
expected.

But isn't the top-level directory always part of the cone? If so, I
think that what this (and the below test) is demonstrating is that we
can show changes inside of the cone without expanding the sparse-index.

Having that test makes absolute sense to me. But I think it might also
make sense to have a test that creates some directory structure outside
of the cone, modifies it, and then ensures that both (a) those changes
aren't visible to `git diff` when the sparse-checkout is active and (b)
that running `git diff` doesn't cause the sparse-index to be expanded.

Thanks,
Taylor
Lessley Dennington Oct. 26, 2021, 4:10 p.m. UTC | #2
On 10/25/21 1:47 PM, Taylor Blau wrote:
> On Fri, Oct 15, 2021 at 09:20:34PM +0000, Lessley Dennington via GitGitGadget wrote:
>> From: Lessley Dennington <lessleydennington@gmail.com>
>>
>> Enable the sparse index within the 'git diff' command. Its implementation
>> already safely integrates with the sparse index because it shares code with
>> the 'git status' and 'git checkout' commands that were already integrated.
> 
> Good, it looks like most of the heavy-lifting to make `git diff` work
> with the sparse index was already done elsewhere.
> 
> It may be helpful here to include either one of two things to help
> readers and reviewers understand what's going on:
> 
>    - A summary of what `git status` and/or `git checkout` does to work
>      with the sparse index.
>    - Or the patches which make those commands work with the sparse index
>      so that readers can refer back to them.
> 
> Having either of those would help readers who are unfamiliar with
> builtin/diff.c convince themselves more easily that setting
> 'command_requires_full_index = 0' is all that's needed here.
> 
Great suggestion, thank you!
>> The most interesting thing to do is to add tests that verify that 'git diff'
>> behaves correctly when the sparse index is enabled. These cases are:
>>
>> 1. The index is not expanded for 'diff' and 'diff --staged'
>> 2. 'diff' and 'diff --staged' behave the same in full checkout, sparse
>> checkout, and sparse index repositories in the following partially-staged
>> scenarios (i.e. the index, HEAD, and working directory differ at a given
>> path):
>>      1. Path is within sparse-checkout cone
>>      2. Path is outside sparse-checkout cone
>>      3. A merge conflict exists for paths outside sparse-checkout cone
> 
> Nice, these are all of the test cases that I would expect to demonstrate
> interesting behavior.
> 
>> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
>> index f19c1b3e2eb..e5d15be9d45 100755
>> --- a/t/t1092-sparse-checkout-compatibility.sh
>> +++ b/t/t1092-sparse-checkout-compatibility.sh
>> @@ -386,6 +386,46 @@ test_expect_success 'diff --staged' '
>>   	test_all_match git diff --staged
>>   '
>>
>> +test_expect_success 'diff partially-staged' '
>> +	init_repos &&
>> +
>> +	write_script edit-contents <<-\EOF &&
>> +	echo text >>$1
>> +	EOF
>> +
>> +	# Add file within cone
>> +	test_sparse_match git sparse-checkout set deep &&
>> +	run_on_all ../edit-contents deep/testfile &&
>> +	test_all_match git add deep/testfile &&
>> +	run_on_all ../edit-contents deep/testfile &&
>> +
>> +	test_all_match git diff &&
>> +	test_all_match git diff --staged &&
>> +
>> +	# Add file outside cone
>> +	test_all_match git reset --hard &&
>> +	run_on_all mkdir newdirectory &&
>> +	run_on_all ../edit-contents newdirectory/testfile &&
>> +	test_sparse_match git sparse-checkout set newdirectory &&
>> +	test_all_match git add newdirectory/testfile &&
>> +	run_on_all ../edit-contents newdirectory/testfile &&
>> +	test_sparse_match git sparse-checkout set &&
>> +
>> +	test_all_match git diff &&
>> +	test_all_match git diff --staged &&
>> +
>> +	# Merge conflict outside cone
>> +	# The sparse checkout will report a warning that is not in the
>> +	# full checkout, so we use `run_on_all` instead of
>> +	# `test_all_match`
>> +	run_on_all git reset --hard &&
>> +	test_all_match git checkout merge-left &&
>> +	test_all_match test_must_fail git merge merge-right &&
>> +
>> +	test_all_match git diff &&
>> +	test_all_match git diff --staged
>> +'
>> +
>>   # NEEDSWORK: sparse-checkout behaves differently from full-checkout when
>>   # running this test with 'df-conflict-2' after 'df-conflict-1'.
>>   test_expect_success 'diff with renames and conflicts' '
>> @@ -800,6 +840,11 @@ test_expect_success 'sparse-index is not expanded' '
>>   	# Wildcard identifies only full sparse directories, no index expansion
>>   	ensure_not_expanded reset deepest -- folder\* &&
>>
>> +	echo a test change >>sparse-index/README.md &&
>> +	ensure_not_expanded diff &&
> 
> Thinking aloud here as somebody who is unfamiliar with the sparse-index
> tests. ensure_not_expanded relies on the existence of the "sparse-index"
> repository, and its top-level README.md is outside of the
> sparse-checkout cone.
> 
> That makes sense, and when I create a repository with a file outside of
> the sparse-checkout cone and then run `git diff`, I see no changes as
> expected.
> 
> But isn't the top-level directory always part of the cone? If so, I
> think that what this (and the below test) is demonstrating is that we
> can show changes inside of the cone without expanding the sparse-index.
> 
> Having that test makes absolute sense to me. But I think it might also
> make sense to have a test that creates some directory structure outside
> of the cone, modifies it, and then ensures that both (a) those changes
> aren't visible to `git diff` when the sparse-checkout is active and (b)
> that running `git diff` doesn't cause the sparse-index to be expanded.
> 
README.md is actually within the sparse checkout cone - all files at 
root are included by default. So your understanding is correct - we are 
ensuring that making a change to a file in the cone and running both 
diff and diff --staged once the file is in the index doesn't expand the 
sparse index.

I like your idea of verifying that running diff against files outside 
the sparse checkout cone won't expand the index. I've updated the diff 
tests in v3 (which I will send out shortly) to do so.

> Thanks,
> Taylor
> 
Best,
Lessley
Taylor Blau Oct. 26, 2021, 4:15 p.m. UTC | #3
On Tue, Oct 26, 2021 at 09:10:20AM -0700, Lessley Dennington wrote:
> > But isn't the top-level directory always part of the cone? If so, I
> > think that what this (and the below test) is demonstrating is that we
> > can show changes inside of the cone without expanding the sparse-index.
> >
> > Having that test makes absolute sense to me. But I think it might also
> > make sense to have a test that creates some directory structure outside
> > of the cone, modifies it, and then ensures that both (a) those changes
> > aren't visible to `git diff` when the sparse-checkout is active and (b)
> > that running `git diff` doesn't cause the sparse-index to be expanded.
> >
> README.md is actually within the sparse checkout cone - all files at root
> are included by default. So your understanding is correct - we are ensuring
> that making a change to a file in the cone and running both diff and diff
> --staged once the file is in the index doesn't expand the sparse index.
>
> I like your idea of verifying that running diff against files outside the
> sparse checkout cone won't expand the index. I've updated the diff tests in
> v3 (which I will send out shortly) to do so.

Great, thank you! There is no hurry to send out an updated revision from
me, either. It may be good to wait a day or two and see if any other
review trickles in before sending another revision to the list that way
you can batch together updates from multiple reviewers.

Thanks,
Taylor
diff mbox series

Patch

diff --git a/builtin/diff.c b/builtin/diff.c
index dd8ce688ba7..cbf7b51c7c0 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -437,6 +437,9 @@  int cmd_diff(int argc, const char **argv, const char *prefix)
 
 	prefix = setup_git_directory_gently(&nongit);
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	if (!no_index) {
 		/*
 		 * Treat git diff with at least one path outside of the
diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
index bfd332120c8..bff93f16e93 100755
--- a/t/perf/p2000-sparse-operations.sh
+++ b/t/perf/p2000-sparse-operations.sh
@@ -113,5 +113,7 @@  test_perf_on_all git checkout -f -
 test_perf_on_all git reset
 test_perf_on_all git reset --hard
 test_perf_on_all git reset -- does-not-exist
+test_perf_on_all git diff
+test_perf_on_all git diff --staged
 
 test_done
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index f19c1b3e2eb..e5d15be9d45 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -386,6 +386,46 @@  test_expect_success 'diff --staged' '
 	test_all_match git diff --staged
 '
 
+test_expect_success 'diff partially-staged' '
+	init_repos &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+
+	# Add file within cone
+	test_sparse_match git sparse-checkout set deep &&
+	run_on_all ../edit-contents deep/testfile &&
+	test_all_match git add deep/testfile &&
+	run_on_all ../edit-contents deep/testfile &&
+
+	test_all_match git diff &&
+	test_all_match git diff --staged &&
+
+	# Add file outside cone
+	test_all_match git reset --hard &&
+	run_on_all mkdir newdirectory &&
+	run_on_all ../edit-contents newdirectory/testfile &&
+	test_sparse_match git sparse-checkout set newdirectory &&
+	test_all_match git add newdirectory/testfile &&
+	run_on_all ../edit-contents newdirectory/testfile &&
+	test_sparse_match git sparse-checkout set &&
+
+	test_all_match git diff &&
+	test_all_match git diff --staged &&
+
+	# Merge conflict outside cone
+	# The sparse checkout will report a warning that is not in the
+	# full checkout, so we use `run_on_all` instead of
+	# `test_all_match`
+	run_on_all git reset --hard &&
+	test_all_match git checkout merge-left &&
+	test_all_match test_must_fail git merge merge-right &&
+
+	test_all_match git diff &&
+	test_all_match git diff --staged
+'
+
 # NEEDSWORK: sparse-checkout behaves differently from full-checkout when
 # running this test with 'df-conflict-2' after 'df-conflict-1'.
 test_expect_success 'diff with renames and conflicts' '
@@ -800,6 +840,11 @@  test_expect_success 'sparse-index is not expanded' '
 	# Wildcard identifies only full sparse directories, no index expansion
 	ensure_not_expanded reset deepest -- folder\* &&
 
+	echo a test change >>sparse-index/README.md &&
+	ensure_not_expanded diff &&
+	git -C sparse-index add README.md &&
+	ensure_not_expanded diff --staged &&
+
 	ensure_not_expanded checkout -f update-deep &&
 	test_config -C sparse-index pull.twohead ort &&
 	(