diff mbox series

[v3,3/8] update-index: add --force-full-index option for expand/collapse test

Message ID 014a408ea5d9894197c60f8d712749ea3cc39c9d.1633641339.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Sparse Index: integrate with reset | expand

Commit Message

Victoria Dye Oct. 7, 2021, 9:15 p.m. UTC
From: Victoria Dye <vdye@github.com>

Add a new `--force-full-index` option to `git update-index`, which skips
explicitly setting `command_requires_full_index`. This option, intended for
use in internal testing purposes only, lets `git update-index` run as a
command without sparse index compatibility implemented, even after it
receives updates to otherwise use the sparse index.

The specific test `--force-full-index` is intended for - `t1092 -
sparse-index is expanded and converted back` - verifies index compatibility
in commands that do not change the default (enabled)
`command_requires_full_index` repo setting. In the past, the test used `git
reset`. However, as `reset` and other commands are integrated with the
sparse index, the command used in the test would need to keep changing.
Conversely, the `--force-full-index` option makes `git update-index` behave
like a not-yet-sparse-aware command, and can be used in the test
indefinitely without interfering with future sparse index integrations.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 Documentation/git-update-index.txt       |  5 +++++
 builtin/update-index.c                   | 11 +++++++++++
 t/t1092-sparse-checkout-compatibility.sh |  2 +-
 3 files changed, 17 insertions(+), 1 deletion(-)

Comments

Bagas Sanjaya Oct. 8, 2021, 2:50 a.m. UTC | #1
On 08/10/21 04.15, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
> 
> Add a new `--force-full-index` option to `git update-index`, which skips
> explicitly setting `command_requires_full_index`. This option, intended for
> use in internal testing purposes only, lets `git update-index` run as a
> command without sparse index compatibility implemented, even after it
> receives updates to otherwise use the sparse index.
> 
> The specific test `--force-full-index` is intended for - `t1092 -
> sparse-index is expanded and converted back` - verifies index compatibility
> in commands that do not change the default (enabled)
> `command_requires_full_index` repo setting. In the past, the test used `git
> reset`. However, as `reset` and other commands are integrated with the
> sparse index, the command used in the test would need to keep changing.
> Conversely, the `--force-full-index` option makes `git update-index` behave
> like a not-yet-sparse-aware command, and can be used in the test
> indefinitely without interfering with future sparse index integrations.
> 
> Helped-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Victoria Dye <vdye@github.com>

Grammar looks OK.

Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Junio C Hamano Oct. 8, 2021, 5:24 a.m. UTC | #2
"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +	/*
> +	 * If --force-full-index is set, the command should skip manually
> +	 * setting `command_requires_full_index`.
> +	 */

Hmph, doesn't that feel unnaturally backwards, though?

The settings.command_requires_full_index bit forces read-cache to
call ensure_full_index() immediately after the in-core index is read
from the disk.  If we are forcing operating on the full index, I'd
imagine that we'd be making sure that ensure_full_index() to be
called.

I do not see anything in the code that ensures active_cache_changed
to be flipped on.  So the new test that says

    git -C sparse-index -c core.fsmonitor="" update-index --force-full-index

may not call ensure_full_index(), but because nothing marks
the_index as changed, I think we won't call write_locked_index() at
the end of cmd_update_index().  IOW, what we have in the test patch
may be an expensive noop, no?

Or perhaps I am reading the patch completely incorrectly.  I dunno.

> +	prepare_repo_settings(r);
> +	if (!use_default_full_index)
> +		r->settings.command_requires_full_index = 1;
> +
>  	entries = read_cache();
>  	if (entries < 0)
>  		die("cache corrupted");
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 889079f55b8..4aa4fef7b4f 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -635,7 +635,7 @@ test_expect_success 'sparse-index is expanded and converted back' '
>  	init_repos &&
>  
>  	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
> -		git -C sparse-index -c core.fsmonitor="" reset --hard &&
> +		git -C sparse-index -c core.fsmonitor="" update-index --force-full-index &&
>  	test_region index convert_to_sparse trace2.txt &&
>  	test_region index ensure_full_index trace2.txt
>  '
Victoria Dye Oct. 8, 2021, 3:47 p.m. UTC | #3
Junio C Hamano wrote:
> "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> +	/*
>> +	 * If --force-full-index is set, the command should skip manually
>> +	 * setting `command_requires_full_index`.
>> +	 */
> 
> Hmph, doesn't that feel unnaturally backwards, though?
> 
> The settings.command_requires_full_index bit forces read-cache to
> call ensure_full_index() immediately after the in-core index is read
> from the disk.  If we are forcing operating on the full index, I'd
> imagine that we'd be making sure that ensure_full_index() to be
> called.
> 

I tried coming up with a user-facing name that wasn't too focused on the
internal implementation, but it ends up being misleading. The intention was
to have this be a variation of `git update-index` that uses the default
setting for `command_requires_full_index` but then proceeds to read and
write the index as `update-index` normally would. Something like
`--use-default-index-sparsity` might have been more accurate?

> I do not see anything in the code that ensures active_cache_changed
> to be flipped on.  So the new test that says
> 
>     git -C sparse-index -c core.fsmonitor="" update-index --force-full-index
> 
> may not call ensure_full_index(), but because nothing marks
> the_index as changed, I think we won't call write_locked_index() at
> the end of cmd_update_index().  IOW, what we have in the test patch
> may be an expensive noop, no?
> 

In the test's use-case, `active_cache_changed` ends up set to
`CACHE_TREE_CHANGED`, which forces writing the index. It is still
effectively a no-op, but it serves the needs of the test.

In any case, Elijah suggested using a `test-tool` subcommand for this
purpose [1], which I think is more appropriate overall. Something like
`test-tool read-write-cache` can be implemented to make no mention of
`command_requires_full_index` (therefore using its default value) and force
a basic read & write of the index. It also eliminates the issue of having a
user-facing name at all, and can easily be removed once all sparse index
integrations are done.

[1] https://lore.kernel.org/git/CABPp-BF+bEUcyE0N79uRCkpCayJx_NMqOpnMSHHrpJM5a9hAWw@mail.gmail.com/
Junio C Hamano Oct. 8, 2021, 5:19 p.m. UTC | #4
Victoria Dye <vdye@github.com> writes:

> I tried coming up with a user-facing name that wasn't too focused on the
> internal implementation, but it ends up being misleading. The intention was
> to have this be a variation of `git update-index` that uses the default
> setting for `command_requires_full_index` but then proceeds to read and
> write the index as `update-index` normally would. Something like
> `--use-default-index-sparsity` might have been more accurate?

The option name in the reviewed patch does imply "we force expanding
to full" and not "use the default", so it probably needs renaming,
if we want the "use the default" semantics.  But is that useful in
the context of the test you are using it in place of "reset" or "mv"?
Even if the default is somehow flipped to use sparse always, wouldn't
the particular test want the index expanded?  I dunno.

> In the test's use-case, `active_cache_changed` ends up set to
> `CACHE_TREE_CHANGED`, which forces writing the index. It is still
> effectively a no-op, but it serves the needs of the test.

Ah, cache-tree is updated, then it's OK.

As to test-tool vs end-user-accessible-command, I do not have a
strong opinion, but use your imagination and ask Derrick or somebody
else for their imagination to see if such a "force expand" feature
may be something the end-users might need an access to in order to
dig themselves out of a hole (in which case, it may be better to
make it end-user-accessible) or not (in which case, test-tool is
more appropriate).

Thanks.
Derrick Stolee Oct. 11, 2021, 2:12 p.m. UTC | #5
On 10/8/21 1:19 PM, Junio C Hamano wrote:
> Victoria Dye <vdye@github.com> writes:
> 
>> I tried coming up with a user-facing name that wasn't too focused on the
>> internal implementation, but it ends up being misleading. The intention was
>> to have this be a variation of `git update-index` that uses the default
>> setting for `command_requires_full_index` but then proceeds to read and
>> write the index as `update-index` normally would. Something like
>> `--use-default-index-sparsity` might have been more accurate?
> 
> The option name in the reviewed patch does imply "we force expanding
> to full" and not "use the default", so it probably needs renaming,
> if we want the "use the default" semantics.  But is that useful in
> the context of the test you are using it in place of "reset" or "mv"?
> Even if the default is somehow flipped to use sparse always, wouldn't
> the particular test want the index expanded?  I dunno.
> 
>> In the test's use-case, `active_cache_changed` ends up set to
>> `CACHE_TREE_CHANGED`, which forces writing the index. It is still
>> effectively a no-op, but it serves the needs of the test.
> 
> Ah, cache-tree is updated, then it's OK.
> 
> As to test-tool vs end-user-accessible-command, I do not have a
> strong opinion, but use your imagination and ask Derrick or somebody
> else for their imagination to see if such a "force expand" feature
> may be something the end-users might need an access to in order to
> dig themselves out of a hole (in which case, it may be better to
> make it end-user-accessible) or not (in which case, test-tool is
> more appropriate).

I think there is something to be said about the name being confusing,
because the current implementation focuses on "expand a sparse index
upon read" but it also allows the index to be written as sparse.

Conversely, if the user runs

  git -c index.sparse=false update-index ...

then the index.sparse config setting forbids conversion from full to
sparse, but does not say anything about expanding to full.

Perhaps this should be corrected: the index.sparse=false setting
should expand a sparse index to a full one, then prevent it from
being converted to a sparse one on write.

This diff should do it:

diff --git a/read-cache.c b/read-cache.c
index 564283c7e7e..04df1051e18 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2376,7 +2376,8 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	if (!istate->repo)
 		istate->repo = the_repository;
 	prepare_repo_settings(istate->repo);
-	if (istate->repo->settings.command_requires_full_index)
+	if (!istate->repo->settings.sparse_index ||
+	    istate->repo->settings.command_requires_full_index)
 		ensure_full_index(istate);
 
 	return istate->cache_nr;

Victoria, what are your thoughts about including such a change?

Junio, would it be better to change the config setting, and then
update this test to use the config setting over a command-line flag?
This would allow us to punt on the --force-full-index flag until we
have time to focus on the 'git update-index' command and interactions
like this.

Thanks,
-Stolee
Victoria Dye Oct. 11, 2021, 3:05 p.m. UTC | #6
Derrick Stolee wrote:
> On 10/8/21 1:19 PM, Junio C Hamano wrote:
>> Victoria Dye <vdye@github.com> writes:
>>
>>> I tried coming up with a user-facing name that wasn't too focused on the
>>> internal implementation, but it ends up being misleading. The intention was
>>> to have this be a variation of `git update-index` that uses the default
>>> setting for `command_requires_full_index` but then proceeds to read and
>>> write the index as `update-index` normally would. Something like
>>> `--use-default-index-sparsity` might have been more accurate?
>>
>> The option name in the reviewed patch does imply "we force expanding
>> to full" and not "use the default", so it probably needs renaming,
>> if we want the "use the default" semantics.  But is that useful in
>> the context of the test you are using it in place of "reset" or "mv"?
>> Even if the default is somehow flipped to use sparse always, wouldn't
>> the particular test want the index expanded?  I dunno.
>>
>>> In the test's use-case, `active_cache_changed` ends up set to
>>> `CACHE_TREE_CHANGED`, which forces writing the index. It is still
>>> effectively a no-op, but it serves the needs of the test.
>>
>> Ah, cache-tree is updated, then it's OK.
>>
>> As to test-tool vs end-user-accessible-command, I do not have a
>> strong opinion, but use your imagination and ask Derrick or somebody
>> else for their imagination to see if such a "force expand" feature
>> may be something the end-users might need an access to in order to
>> dig themselves out of a hole (in which case, it may be better to
>> make it end-user-accessible) or not (in which case, test-tool is
>> more appropriate).
> 
> I think there is something to be said about the name being confusing,
> because the current implementation focuses on "expand a sparse index
> upon read" but it also allows the index to be written as sparse.
> 

This helps clarify what I was misinterpreting in the test. It isn't looking
for "default" behavior, it's verifying whether trace2 logs capture index
expansion and collapse when those operations are expected to happen,
regardless of whether that's because `command_requires_full_index` is 1 or
because the command needs to use entries inside of sparse directories. With
that interpretation, I can replace the command with `git reset
update-folder1 -- folder1/a` and get the same result (without needing to
change the test in the future *or* add a new `git` command option /
`test-tool` subcommand).

> Conversely, if the user runs
> 
>   git -c index.sparse=false update-index ...
> 
> then the index.sparse config setting forbids conversion from full to
> sparse, but does not say anything about expanding to full.
> 
> Perhaps this should be corrected: the index.sparse=false setting
> should expand a sparse index to a full one, then prevent it from
> being converted to a sparse one on write.
> 
> This diff should do it:
> 
> diff --git a/read-cache.c b/read-cache.c
> index 564283c7e7e..04df1051e18 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -2376,7 +2376,8 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
>  	if (!istate->repo)
>  		istate->repo = the_repository;
>  	prepare_repo_settings(istate->repo);
> -	if (istate->repo->settings.command_requires_full_index)
> +	if (!istate->repo->settings.sparse_index ||
> +	    istate->repo->settings.command_requires_full_index)
>  		ensure_full_index(istate);
>  
>  	return istate->cache_nr;
> 
> Victoria, what are your thoughts about including such a change?
> 

I think this is a worthwhile change, but I'd prefer submitting it separately
(either in an upcoming sparse index integration or on its own). It's not
directly needed by anything in this series, and I'd like to avoid adding
features to the scope if possible.
Junio C Hamano Oct. 11, 2021, 3:24 p.m. UTC | #7
Derrick Stolee <stolee@gmail.com> writes:

> Junio, would it be better to change the config setting, and then
> update this test to use the config setting over a command-line flag?
> This would allow us to punt on the --force-full-index flag until we
> have time to focus on the 'git update-index' command and interactions
> like this.

I do not have a strong opinion on where we add the feature; as long
as we have a way to let us avoid having to unnecessarily change this
particular test, that's perfectly fine, and if we can reuse it as a
way for end-users to help those who are debugging their issues, that
would be an added bonus.

Thanks.
diff mbox series

Patch

diff --git a/Documentation/git-update-index.txt b/Documentation/git-update-index.txt
index 2853f168d97..06255e321a3 100644
--- a/Documentation/git-update-index.txt
+++ b/Documentation/git-update-index.txt
@@ -24,6 +24,7 @@  SYNOPSIS
 	     [--[no-]fsmonitor]
 	     [--really-refresh] [--unresolve] [--again | -g]
 	     [--info-only] [--index-info]
+	     [--force-full-index]
 	     [-z] [--stdin] [--index-version <n>]
 	     [--verbose]
 	     [--] [<file>...]
@@ -170,6 +171,10 @@  time. Version 4 is relatively young (first released in 1.8.0 in
 October 2012). Other Git implementations such as JGit and libgit2
 may not support it yet.
 
+--force-full-index::
+	Force the command to operate on a full index, expanding a sparse
+	index if necessary.
+
 -z::
 	Only meaningful with `--stdin` or `--index-info`; paths are
 	separated with NUL character instead of LF.
diff --git a/builtin/update-index.c b/builtin/update-index.c
index 187203e8bb5..32ada3ead77 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -964,6 +964,7 @@  int cmd_update_index(int argc, const char **argv, const char *prefix)
 	int split_index = -1;
 	int force_write = 0;
 	int fsmonitor = -1;
+	int use_default_full_index = 0;
 	struct lock_file lock_file = LOCK_INIT;
 	struct parse_opt_ctx_t ctx;
 	strbuf_getline_fn getline_fn;
@@ -1069,6 +1070,8 @@  int cmd_update_index(int argc, const char **argv, const char *prefix)
 		{OPTION_SET_INT, 0, "no-fsmonitor-valid", &mark_fsmonitor_only, NULL,
 			N_("clear fsmonitor valid bit"),
 			PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL, UNMARK_FLAG},
+		OPT_SET_INT(0, "force-full-index", &use_default_full_index,
+			N_("run with full index explicitly required"), 1),
 		OPT_END()
 	};
 
@@ -1082,6 +1085,14 @@  int cmd_update_index(int argc, const char **argv, const char *prefix)
 	if (newfd < 0)
 		lock_error = errno;
 
+	/*
+	 * If --force-full-index is set, the command should skip manually
+	 * setting `command_requires_full_index`.
+	 */
+	prepare_repo_settings(r);
+	if (!use_default_full_index)
+		r->settings.command_requires_full_index = 1;
+
 	entries = read_cache();
 	if (entries < 0)
 		die("cache corrupted");
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 889079f55b8..4aa4fef7b4f 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -635,7 +635,7 @@  test_expect_success 'sparse-index is expanded and converted back' '
 	init_repos &&
 
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
-		git -C sparse-index -c core.fsmonitor="" reset --hard &&
+		git -C sparse-index -c core.fsmonitor="" update-index --force-full-index &&
 	test_region index convert_to_sparse trace2.txt &&
 	test_region index ensure_full_index trace2.txt
 '