[v3,1/2] diff: enable and test the sparse index

Message ID	991aaad37b41f71faa19fdef4373ccc115edcc40.1635802069.git.gitgitgadget@gmail.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <git-owner@kernel.org> Message-Id: <991aaad37b41f71faa19fdef4373ccc115edcc40.1635802069.git.gitgitgadget@gmail.com> In-Reply-To: <pull.1050.v3.git.1635802069.gitgitgadget@gmail.com> References: <pull.1050.v2.git.1634332835.gitgitgadget@gmail.com> <pull.1050.v3.git.1635802069.gitgitgadget@gmail.com> Date: Mon, 01 Nov 2021 21:27:48 +0000 Subject: [PATCH v3 1/2] diff: enable and test the sparse index Fcc: Sent Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 To: git@vger.kernel.org Cc: stolee@gmail.com, gitster@pobox.com, newren@gmail.com, Taylor Blau <me@ttaylorr.com>, Lessley Dennington <lessleydennington@gmail.com>, Lessley Dennington <lessleydennington@gmail.com> Precedence: bulk From: Lessley Dennington <lessleydennington@gmail.com>
Series	Sparse Index: diff and blame builtins \| expand [v3,0/2] Sparse Index: diff and blame builtins [v3,1/2] diff: enable and test the sparse index [v3,2/2] blame: enable and test the sparse index

Message ID

991aaad37b41f71faa19fdef4373ccc115edcc40.1635802069.git.gitgitgadget@gmail.com (mailing list archive)

State

Superseded

Headers

Message-Id: 
 <991aaad37b41f71faa19fdef4373ccc115edcc40.1635802069.git.gitgitgadget@gmail.com>
In-Reply-To: <pull.1050.v3.git.1635802069.gitgitgadget@gmail.com>
References: <pull.1050.v2.git.1634332835.gitgitgadget@gmail.com>
        <pull.1050.v3.git.1635802069.gitgitgadget@gmail.com>
Date: Mon, 01 Nov 2021 21:27:48 +0000
Subject: [PATCH v3 1/2] diff: enable and test the sparse index
Fcc: Sent
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
To: git@vger.kernel.org
Cc: stolee@gmail.com, gitster@pobox.com, newren@gmail.com,
        Taylor Blau <me@ttaylorr.com>,
        Lessley Dennington <lessleydennington@gmail.com>,
        Lessley Dennington <lessleydennington@gmail.com>
Precedence: bulk
From: Lessley Dennington <lessleydennington@gmail.com>

Series

Sparse Index: diff and blame builtins | expand

Commit Message

Lessley Dennington Nov. 1, 2021, 9:27 p.m. UTC

From: Lessley Dennington <lessleydennington@gmail.com>

Enable the sparse index within the 'git diff' command. Its implementation
already safely integrates with the sparse index because it shares code with
the 'git status' and 'git checkout' commands that were already integrated.
For more details see:

d76723ee53 (status: use sparse-index throughout, 2021-07-14)
1ba5f45132 (checkout: stop expanding sparse indexes, 2021-06-29)

The most interesting thing to do is to add tests that verify that 'git diff'
behaves correctly when the sparse index is enabled. These cases are:

1. The index is not expanded for 'diff' and 'diff --staged'
2. 'diff' and 'diff --staged' behave the same in full checkout, sparse
checkout, and sparse index repositories in the following partially-staged
scenarios (i.e. the index, HEAD, and working directory differ at a given
path):
    1. Path is within sparse-checkout cone
    2. Path is outside sparse-checkout cone
    3. A merge conflict exists for paths outside sparse-checkout cone

The `p2000` tests demonstrate a ~30% execution time reduction for 'git
diff' and a ~75% execution time reduction for 'git diff --staged' using a
sparse index:

Test                                      before  after
-------------------------------------------------------------
2000.30: git diff (full-v3)               0.37    0.36 -2.7%
2000.31: git diff (full-v4)               0.36    0.35 -2.8%
2000.32: git diff (sparse-v3)             0.46    0.30 -34.8%
2000.33: git diff (sparse-v4)             0.43    0.31 -27.9%
2000.34: git diff --staged (full-v3)      0.08    0.08 +0.0%
2000.35: git diff --staged (full-v4)      0.08    0.08 +0.0%
2000.36: git diff --staged (sparse-v3)    0.17    0.04 -76.5%
2000.37: git diff --staged (sparse-v4)    0.16    0.04 -75.0%

Co-authored-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
---
 builtin/diff.c                           |  3 ++
 t/perf/p2000-sparse-operations.sh        |  2 ++
 t/t1092-sparse-checkout-compatibility.sh | 46 ++++++++++++++++++++++++
 3 files changed, 51 insertions(+)

Comments

Junio C Hamano Nov. 3, 2021, 5:05 p.m. UTC | #1

"Lessley Dennington via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> 2000.34: git diff --staged (full-v3)      0.08    0.08 +0.0%
> 2000.35: git diff --staged (full-v4)      0.08    0.08 +0.0%
> 2000.36: git diff --staged (sparse-v3)    0.17    0.04 -76.5%
> 2000.37: git diff --staged (sparse-v4)    0.16    0.04 -75.0%

Please do not add more use of the synonym to the test suite, other
than the one that makes sure the synonym works the same way as the
real option, which is "--cached".

> diff --git a/builtin/diff.c b/builtin/diff.c
> index dd8ce688ba7..cbf7b51c7c0 100644
> --- a/builtin/diff.c
> +++ b/builtin/diff.c
> @@ -437,6 +437,9 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
>  
>  	prefix = setup_git_directory_gently(&nongit);
>  
> +	prepare_repo_settings(the_repository);
> +	the_repository->settings.command_requires_full_index = 0;
> +

Doesn't the code need to be protected with

	if (!nongit) {
		prepare_repo_settings(the_repository);
		the_repository->settings.command_requires_full_index = 0;
	}

at the very least?  It may be that the code is getting lucky because
the_repository may be initialized with a random value (after all,
when we are not in a repository, there is nowhere to read the
on-disk settings from) and we may even be able to set a bit in the
settings structure without crashing, but conceptually, doing the
above when we _know_ we are not in any repository is simply wrong.

I wonder if prepare_repo_settings() needs be more strict.  For
example, shouldn't it check if we have a repository to begin with
and BUG() if it was called when there is not a repository?  After
all, it tries to read from the repository configuration file, so any
necessary set-up to discover where the gitdir is must have been done
already before it can be called.

With such a safety feature to catch a programmer errors, perhaps the
above could have been caught before the patch hit the list.

Thoughts?  Am I missing some chicken-and-egg situation where
prepare_repo_settings() must be callable before we know where the
repository is, or something, which justifies why the function is so
loose in its sanity checks in the current form?

Lessley Dennington Nov. 4, 2021, 11:55 p.m. UTC | #2

On 11/3/21 10:05 AM, Junio C Hamano wrote:
> "Lessley Dennington via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
> 
>> 2000.34: git diff --staged (full-v3)      0.08    0.08 +0.0%
>> 2000.35: git diff --staged (full-v4)      0.08    0.08 +0.0%
>> 2000.36: git diff --staged (sparse-v3)    0.17    0.04 -76.5%
>> 2000.37: git diff --staged (sparse-v4)    0.16    0.04 -75.0%
> 
> Please do not add more use of the synonym to the test suite, other
> than the one that makes sure the synonym works the same way as the
> real option, which is "--cached".
>

Thank you, changed for v4.

>> diff --git a/builtin/diff.c b/builtin/diff.c
>> index dd8ce688ba7..cbf7b51c7c0 100644
>> --- a/builtin/diff.c
>> +++ b/builtin/diff.c
>> @@ -437,6 +437,9 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
>>   
>>   	prefix = setup_git_directory_gently(&nongit);
>>   
>> +	prepare_repo_settings(the_repository);
>> +	the_repository->settings.command_requires_full_index = 0;
>> +
> 
> Doesn't the code need to be protected with
> 
> 	if (!nongit) {
> 		prepare_repo_settings(the_repository);
> 		the_repository->settings.command_requires_full_index = 0;
> 	}
> 
> at the very least?  It may be that the code is getting lucky because
> the_repository may be initialized with a random value (after all,
> when we are not in a repository, there is nowhere to read the
> on-disk settings from) and we may even be able to set a bit in the
> settings structure without crashing, but conceptually, doing the
> above when we _know_ we are not in any repository is simply wrong.
> 
> I wonder if prepare_repo_settings() needs be more strict.  For
> example, shouldn't it check if we have a repository to begin with
> and BUG() if it was called when there is not a repository?  After
> all, it tries to read from the repository configuration file, so any
> necessary set-up to discover where the gitdir is must have been done
> already before it can be called.
> 
> With such a safety feature to catch a programmer errors, perhaps the
> above could have been caught before the patch hit the list.
> 
> Thoughts?  Am I missing some chicken-and-egg situation where
> prepare_repo_settings() must be callable before we know where the
> repository is, or something, which justifies why the function is so
> loose in its sanity checks in the current form?
> 
> 

This seems like a good idea. I've added both the nongit check and the 
prepare_repo_settings() updates you've suggested for v4, pending review 
by my team.

Best,
Lessley

diff --git a/builtin/diff.c b/builtin/diff.c
index dd8ce688ba7..cbf7b51c7c0 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -437,6 +437,9 @@  int cmd_diff(int argc, const char **argv, const char *prefix)
 
 	prefix = setup_git_directory_gently(&nongit);
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	if (!no_index) {
 		/*
 		 * Treat git diff with at least one path outside of the
diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
index bfd332120c8..bff93f16e93 100755
--- a/t/perf/p2000-sparse-operations.sh
+++ b/t/perf/p2000-sparse-operations.sh
@@ -113,5 +113,7 @@  test_perf_on_all git checkout -f -
 test_perf_on_all git reset
 test_perf_on_all git reset --hard
 test_perf_on_all git reset -- does-not-exist
+test_perf_on_all git diff
+test_perf_on_all git diff --staged
 
 test_done
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 44d5e11c762..53524660759 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -832,6 +832,52 @@  test_expect_success 'sparse-index is not expanded: merge conflict in cone' '
 	)
 '
 
+test_expect_success 'sparse index is not expanded: diff' '
+	init_repos &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+
+	# Add file within cone
+	test_sparse_match git sparse-checkout set deep &&
+	run_on_all ../edit-contents deep/testfile &&
+	test_all_match git add deep/testfile &&
+	run_on_all ../edit-contents deep/testfile &&
+
+	test_all_match git diff &&
+	test_all_match git diff --staged &&
+	ensure_not_expanded diff &&
+	ensure_not_expanded diff --staged &&
+
+	# Add file outside cone
+	test_all_match git reset --hard &&
+	run_on_all mkdir newdirectory &&
+	run_on_all ../edit-contents newdirectory/testfile &&
+	test_sparse_match git sparse-checkout set newdirectory &&
+	test_all_match git add newdirectory/testfile &&
+	run_on_all ../edit-contents newdirectory/testfile &&
+	test_sparse_match git sparse-checkout set &&
+
+	test_all_match git diff &&
+	test_all_match git diff --staged &&
+	ensure_not_expanded diff &&
+	ensure_not_expanded diff --staged &&
+
+	# Merge conflict outside cone
+	# The sparse checkout will report a warning that is not in the
+	# full checkout, so we use `run_on_all` instead of
+	# `test_all_match`
+	run_on_all git reset --hard &&
+	test_all_match git checkout merge-left &&
+	test_all_match test_must_fail git merge merge-right &&
+
+	test_all_match git diff &&
+	test_all_match git diff --staged &&
+	ensure_not_expanded diff &&
+	ensure_not_expanded diff --staged
+'
+
 # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
 # in this scenario, but it shouldn't.
 test_expect_success 'reset mixed and checkout orphan' '

[v3,1/2] diff: enable and test the sparse index

Commit Message

Comments

Patch