diff mbox series

[v4,4/4] sparse-index: update do_read_index to ensure correct sparsity

Message ID b67cd9d07f8a3802e5b50d58965f283620cd3876.1635515487.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series sparse-index: expand/collapse based on 'index.sparse' | expand

Commit Message

Victoria Dye Oct. 29, 2021, 1:51 p.m. UTC
From: Victoria Dye <vdye@github.com>

If `command_requires_full_index` is false, ensure correct in-core index
sparsity on read by calling `ensure_correct_sparsity`. This change is meant
to update the how the index is read in a command after sparse index-related
repository settings are modified. Previously, for example, if `index.sparse`
were changed from `true` to `false`, the in-core index on the next command
would be sparse. The index would only be expanded to full when it was next
written to disk.

By adding a call to `ensure_correct_sparsity`, the in-core index now matches
the sparsity dictated by the relevant repository settings as soon as it is
read into memory, rather than when it is later written to disk.

Helped-by: Junio C Hamano <gitster@pobox.com>
Co-authored-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 read-cache.c                             |  8 ++++++
 t/t1092-sparse-checkout-compatibility.sh | 31 ++++++++++++++++++++++++
 2 files changed, 39 insertions(+)

Comments

Elijah Newren Nov. 22, 2021, 5:36 p.m. UTC | #1
On Fri, Oct 29, 2021 at 6:56 AM Victoria Dye via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Victoria Dye <vdye@github.com>
>
> If `command_requires_full_index` is false, ensure correct in-core index
> sparsity on read by calling `ensure_correct_sparsity`. This change is meant
> to update the how the index is read in a command after sparse index-related

s/update the how/update how/ ?

> repository settings are modified. Previously, for example, if `index.sparse`
> were changed from `true` to `false`, the in-core index on the next command
> would be sparse. The index would only be expanded to full when it was next
> written to disk.
>
> By adding a call to `ensure_correct_sparsity`, the in-core index now matches
> the sparsity dictated by the relevant repository settings as soon as it is
> read into memory, rather than when it is later written to disk.

I split up reading this series across different days, and when I got
here, my first reaction was "Okay, but why would you want that?
Sounds like extra work for no gain."  I went and looked up the cover
letter and saw you mentioned that this "introduces the ability to
enable/disable the sparse index on a command-by-command basis (e.g.,
allowing a user to troubleshoot a sparse-aware command with '-c
index.sparse=false' [1]).  That seems like a good reason to me, and
sounds like it belongs in this commit message.  But it sounds like you
had other reasons in mind.  If so, could you share them; I'm having
difficulty understanding how this would make a difference other than
in the troubleshooting scenario.

>
> Helped-by: Junio C Hamano <gitster@pobox.com>
> Co-authored-by: Derrick Stolee <dstolee@microsoft.com>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
>  read-cache.c                             |  8 ++++++
>  t/t1092-sparse-checkout-compatibility.sh | 31 ++++++++++++++++++++++++
>  2 files changed, 39 insertions(+)
>
> diff --git a/read-cache.c b/read-cache.c
> index a78b88a41bf..b3772ba70a1 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -2337,9 +2337,17 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
>
>         if (!istate->repo)
>                 istate->repo = the_repository;
> +
> +       /*
> +        * If the command explicitly requires a full index, force it
> +        * to be full. Otherwise, correct the sparsity based on repository
> +        * settings and other properties of the index (if necessary).
> +        */
>         prepare_repo_settings(istate->repo);
>         if (istate->repo->settings.command_requires_full_index)
>                 ensure_full_index(istate);
> +       else
> +               ensure_correct_sparsity(istate);
>
>         return istate->cache_nr;
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index ca91c6a67f8..59accde1fa3 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -694,6 +694,37 @@ test_expect_success 'sparse-index is expanded and converted back' '
>         test_region index ensure_full_index trace2.txt
>  '
>
> +test_expect_success 'index.sparse disabled inline uses full index' '
> +       init_repos &&
> +
> +       # When index.sparse is disabled inline with `git status`, the
> +       # index is expanded at the beginning of the execution then never
> +       # converted back to sparse. It is then written to disk as a full index.
> +       rm -f trace2.txt &&
> +       GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
> +               git -C sparse-index -c index.sparse=false status &&
> +       ! test_region index convert_to_sparse trace2.txt &&
> +       test_region index ensure_full_index trace2.txt &&
> +
> +       # Since index.sparse is set to true at a repo level, the index
> +       # is converted from full to sparse when read, then never expanded
> +       # over the course of `git status`. It is written to disk as a sparse
> +       # index.
> +       rm -f trace2.txt &&
> +       GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
> +               git -C sparse-index status &&
> +       test_region index convert_to_sparse trace2.txt &&
> +       ! test_region index ensure_full_index trace2.txt &&
> +
> +       # Now that the index has been written to disk as sparse, it is not
> +       # converted to sparse (or expanded to full) when read by `git status`.
> +       rm -f trace2.txt &&
> +       GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
> +               git -C sparse-index status &&
> +       ! test_region index convert_to_sparse trace2.txt &&
> +       ! test_region index ensure_full_index trace2.txt
> +'
> +
>  ensure_not_expanded () {
>         rm -f trace2.txt &&
>         echo >>sparse-index/untracked.txt &&
> --
> gitgitgadget

The rest of the patch looks fine.
Victoria Dye Nov. 22, 2021, 6:59 p.m. UTC | #2
Elijah Newren wrote:
> On Fri, Oct 29, 2021 at 6:56 AM Victoria Dye via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Victoria Dye <vdye@github.com>
>>
>> If `command_requires_full_index` is false, ensure correct in-core index
>> sparsity on read by calling `ensure_correct_sparsity`. This change is meant
>> to update the how the index is read in a command after sparse index-related
> 
> s/update the how/update how/ ?
> 

This was probably the result of mixing up "update the way" and "update how".
I'll go with the latter.

>> repository settings are modified. Previously, for example, if `index.sparse`
>> were changed from `true` to `false`, the in-core index on the next command
>> would be sparse. The index would only be expanded to full when it was next
>> written to disk.
>>
>> By adding a call to `ensure_correct_sparsity`, the in-core index now matches
>> the sparsity dictated by the relevant repository settings as soon as it is
>> read into memory, rather than when it is later written to disk.
> 
> I split up reading this series across different days, and when I got
> here, my first reaction was "Okay, but why would you want that?
> Sounds like extra work for no gain."  I went and looked up the cover
> letter and saw you mentioned that this "introduces the ability to
> enable/disable the sparse index on a command-by-command basis (e.g.,
> allowing a user to troubleshoot a sparse-aware command with '-c
> index.sparse=false' [1]).  That seems like a good reason to me, and
> sounds like it belongs in this commit message.  But it sounds like you
> had other reasons in mind.  If so, could you share them; I'm having
> difficulty understanding how this would make a difference other than
> in the troubleshooting scenario.
> 

The troubleshooting scenario was what inspired this change in the first
place, but it applies any time someone changes repository settings outside
of `git sparse-checkout init`. One relevant scenario I can imagine is if a
user enables `index.sparse, commands would not use the sparse index until a
command that *writes* the index is executed:

$ git config index.sparse true
$ git diff      # read full index, full in-core, no write
$ git add .     # read full index, full in-core, write sparse
$ git diff      # read sparse index, sparse in-core, no write

Another scenario might be enabling `index.sparse`, then running a command
that turns out to be surprisingly slow because it's operating on a full
index in-core (i.e., much slower than the convert_to_sparse + command with
sparse index would be). 

These are definitely edge cases - they rely on manual config changes, and
they're only really noticeable 1) if the config change is immediately
followed by index read-only commands and 2) if the first index-writing
command after the config change is slow. That said, on the off chance that a
user (or future developer) encounter one of these scenarios, I think it'll
be helpful to have `index.sparse` take effect "immediately" after the config
change.

I'll include these (and the troubleshooting) examples in the updated commit
message.

>>
>> Helped-by: Junio C Hamano <gitster@pobox.com>
>> Co-authored-by: Derrick Stolee <dstolee@microsoft.com>
>> Signed-off-by: Victoria Dye <vdye@github.com>
>> ---
>>  read-cache.c                             |  8 ++++++
>>  t/t1092-sparse-checkout-compatibility.sh | 31 ++++++++++++++++++++++++
>>  2 files changed, 39 insertions(+)
>>
>> diff --git a/read-cache.c b/read-cache.c
>> index a78b88a41bf..b3772ba70a1 100644
>> --- a/read-cache.c
>> +++ b/read-cache.c
>> @@ -2337,9 +2337,17 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
>>
>>         if (!istate->repo)
>>                 istate->repo = the_repository;
>> +
>> +       /*
>> +        * If the command explicitly requires a full index, force it
>> +        * to be full. Otherwise, correct the sparsity based on repository
>> +        * settings and other properties of the index (if necessary).
>> +        */
>>         prepare_repo_settings(istate->repo);
>>         if (istate->repo->settings.command_requires_full_index)
>>                 ensure_full_index(istate);
>> +       else
>> +               ensure_correct_sparsity(istate);
>>
>>         return istate->cache_nr;
>>
>> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
>> index ca91c6a67f8..59accde1fa3 100755
>> --- a/t/t1092-sparse-checkout-compatibility.sh
>> +++ b/t/t1092-sparse-checkout-compatibility.sh
>> @@ -694,6 +694,37 @@ test_expect_success 'sparse-index is expanded and converted back' '
>>         test_region index ensure_full_index trace2.txt
>>  '
>>
>> +test_expect_success 'index.sparse disabled inline uses full index' '
>> +       init_repos &&
>> +
>> +       # When index.sparse is disabled inline with `git status`, the
>> +       # index is expanded at the beginning of the execution then never
>> +       # converted back to sparse. It is then written to disk as a full index.
>> +       rm -f trace2.txt &&
>> +       GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
>> +               git -C sparse-index -c index.sparse=false status &&
>> +       ! test_region index convert_to_sparse trace2.txt &&
>> +       test_region index ensure_full_index trace2.txt &&
>> +
>> +       # Since index.sparse is set to true at a repo level, the index
>> +       # is converted from full to sparse when read, then never expanded
>> +       # over the course of `git status`. It is written to disk as a sparse
>> +       # index.
>> +       rm -f trace2.txt &&
>> +       GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
>> +               git -C sparse-index status &&
>> +       test_region index convert_to_sparse trace2.txt &&
>> +       ! test_region index ensure_full_index trace2.txt &&
>> +
>> +       # Now that the index has been written to disk as sparse, it is not
>> +       # converted to sparse (or expanded to full) when read by `git status`.
>> +       rm -f trace2.txt &&
>> +       GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
>> +               git -C sparse-index status &&
>> +       ! test_region index convert_to_sparse trace2.txt &&
>> +       ! test_region index ensure_full_index trace2.txt
>> +'
>> +
>>  ensure_not_expanded () {
>>         rm -f trace2.txt &&
>>         echo >>sparse-index/untracked.txt &&
>> --
>> gitgitgadget
> 
> The rest of the patch looks fine.
>
diff mbox series

Patch

diff --git a/read-cache.c b/read-cache.c
index a78b88a41bf..b3772ba70a1 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2337,9 +2337,17 @@  int do_read_index(struct index_state *istate, const char *path, int must_exist)
 
 	if (!istate->repo)
 		istate->repo = the_repository;
+
+	/*
+	 * If the command explicitly requires a full index, force it
+	 * to be full. Otherwise, correct the sparsity based on repository
+	 * settings and other properties of the index (if necessary).
+	 */
 	prepare_repo_settings(istate->repo);
 	if (istate->repo->settings.command_requires_full_index)
 		ensure_full_index(istate);
+	else
+		ensure_correct_sparsity(istate);
 
 	return istate->cache_nr;
 
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index ca91c6a67f8..59accde1fa3 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -694,6 +694,37 @@  test_expect_success 'sparse-index is expanded and converted back' '
 	test_region index ensure_full_index trace2.txt
 '
 
+test_expect_success 'index.sparse disabled inline uses full index' '
+	init_repos &&
+
+	# When index.sparse is disabled inline with `git status`, the
+	# index is expanded at the beginning of the execution then never
+	# converted back to sparse. It is then written to disk as a full index.
+	rm -f trace2.txt &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git -C sparse-index -c index.sparse=false status &&
+	! test_region index convert_to_sparse trace2.txt &&
+	test_region index ensure_full_index trace2.txt &&
+
+	# Since index.sparse is set to true at a repo level, the index
+	# is converted from full to sparse when read, then never expanded
+	# over the course of `git status`. It is written to disk as a sparse
+	# index.
+	rm -f trace2.txt &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git -C sparse-index status &&
+	test_region index convert_to_sparse trace2.txt &&
+	! test_region index ensure_full_index trace2.txt &&
+
+	# Now that the index has been written to disk as sparse, it is not
+	# converted to sparse (or expanded to full) when read by `git status`.
+	rm -f trace2.txt &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
+		git -C sparse-index status &&
+	! test_region index convert_to_sparse trace2.txt &&
+	! test_region index ensure_full_index trace2.txt
+'
+
 ensure_not_expanded () {
 	rm -f trace2.txt &&
 	echo >>sparse-index/untracked.txt &&