mbox series

[v2,0/7] Sparse Index: integrate with reset

Message ID pull.1048.v2.git.1633440057.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Sparse Index: integrate with reset | expand

Message

Bruce Perry via GitGitGadget Oct. 5, 2021, 1:20 p.m. UTC
This series integrates the sparse index with git reset and provides
miscellaneous fixes and improvements to the command in sparse checkouts.
This includes:

 1. tests added to t1092 and p2000 to establish the baseline functionality
    of the command
 2. repository settings to enable the sparse index with ensure_full_index
    guarding any code paths that break tests without other compatibility
    updates.
 3. modifications to remove or reduce the scope in which ensure_full_index
    must be called.

The sparse index updates are predicated on a fix originating from the
microsoft/git fork [1], correcting how git reset --mixed handles resetting
entries outside the sparse checkout definition. Additionally, a performance
"bug" in next_cache_entry with sparse index is corrected, preventing
repeatedly looping over already-searched entries.

The p2000 tests demonstrate an overall ~70% execution time reduction across
all tested usages of git reset using a sparse index:

Test                                               before   after       
------------------------------------------------------------------------
2000.22: git reset (full-v3)                       0.48     0.51 +6.3% 
2000.23: git reset (full-v4)                       0.47     0.50 +6.4% 
2000.24: git reset (sparse-v3)                     0.93     0.30 -67.7%
2000.25: git reset (sparse-v4)                     0.94     0.29 -69.1%
2000.26: git reset --hard (full-v3)                0.69     0.68 -1.4% 
2000.27: git reset --hard (full-v4)                0.75     0.68 -9.3% 
2000.28: git reset --hard (sparse-v3)              1.29     0.34 -73.6%
2000.29: git reset --hard (sparse-v4)              1.31     0.34 -74.0%
2000.30: git reset -- does-not-exist (full-v3)     0.54     0.51 -5.6% 
2000.31: git reset -- does-not-exist (full-v4)     0.54     0.52 -3.7% 
2000.32: git reset -- does-not-exist (sparse-v3)   1.02     0.31 -69.6%
2000.33: git reset -- does-not-exist (sparse-v4)   1.07     0.30 -72.0%



Changes since V1
================

 * Add --force-full-index option to update-index. The option is used
   circumvent changing command_requires_full_index from its default value -
   right now this is effectively a no-op, but will change once update-index
   is integrated with sparse index. By using this option in the t1092
   expand/collapse test, the command used to test will not need to be
   updated with subsequent sparse index integrations.
 * Update implementation of mixed reset for entries outside sparse checkout
   definition. The condition in which a file should be checked out before
   index reset is simplified to "if it has skip-worktree enabled and a reset
   would change the file, check it out".
   * After checking the behavior of update_index_from_diff with renames,
     found that the diff used by reset does not produce diff queue entries
     with different pathnames for one and two. Because of this, and that
     nothing in the implementation seems to rely on identical path names, no
     BUG check is added.
 * Correct a bug in the sparse index is not expanded tests in t1092 where
   failure of a git reset --mixed test was not being reported. Test now
   verifies an appropriate scenario with corrected failure-checking.

Thanks! -Victoria

[1] microsoft@6b8a074

Kevin Willford (1):
  reset: behave correctly with sparse-checkout

Victoria Dye (6):
  update-index: add --force-full-index option for expand/collapse test
  reset: expand test coverage for sparse checkouts
  reset: integrate with sparse index
  reset: make sparse-aware (except --mixed)
  reset: make --mixed sparse-aware
  unpack-trees: improve performance of next_cache_entry

 Documentation/git-update-index.txt       |   5 +
 builtin/reset.c                          |  62 +++++++++-
 builtin/update-index.c                   |  11 ++
 cache-tree.c                             |  43 ++++++-
 cache.h                                  |  10 ++
 read-cache.c                             |  22 ++--
 t/perf/p2000-sparse-operations.sh        |   3 +
 t/t1092-sparse-checkout-compatibility.sh | 139 ++++++++++++++++++++++-
 t/t7114-reset-sparse-checkout.sh         |  61 ++++++++++
 unpack-trees.c                           |  23 +++-
 10 files changed, 351 insertions(+), 28 deletions(-)
 create mode 100755 t/t7114-reset-sparse-checkout.sh


base-commit: cefe983a320c03d7843ac78e73bd513a27806845
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1048%2Fvdye%2Fvdye%2Fsparse-index-part1-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1048/vdye/vdye/sparse-index-part1-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1048

Range-diff vs v1:

 1:  65905bf4e00 ! 1:  22c69bc6030 reset: behave correctly with sparse-checkout
     @@ builtin/reset.c
       #define REFRESH_INDEX_DELAY_WARNING_IN_MS (2 * 1000)
       
      @@ builtin/reset.c: static void update_index_from_diff(struct diff_queue_struct *q,
     - 		struct diff_options *opt, void *data)
     - {
     - 	int i;
     -+	int pos;
       	int intent_to_add = *(int *)data;
       
       	for (i = 0; i < q->nr; i++) {
     ++		int pos;
       		struct diff_filespec *one = q->queue[i]->one;
     +-		int is_missing = !(one->mode && !is_null_oid(&one->oid));
      +		struct diff_filespec *two = q->queue[i]->two;
     - 		int is_missing = !(one->mode && !is_null_oid(&one->oid));
     -+		int was_missing = !two->mode && is_null_oid(&two->oid);
     ++		int is_in_reset_tree = one->mode && !is_null_oid(&one->oid);
       		struct cache_entry *ce;
     -+		struct cache_entry *ce_before;
     -+		struct checkout state = CHECKOUT_INIT;
     -+
     + 
     +-		if (is_missing && !intent_to_add) {
      +		/*
     -+		 * When using the sparse-checkout feature the cache entries
     -+		 * that are added here will not have the skip-worktree bit
     -+		 * set. Without this code there is data that is lost because
     -+		 * the files that would normally be in the working directory
     -+		 * are not there and show as deleted for the next status.
     -+		 * In the case of added files, they just disappear.
     -+		 *
     -+		 * We need to create the previous version of the files in
     -+		 * the working directory so that they will have the right
     -+		 * content and the next status call will show modified or
     -+		 * untracked files correctly.
     ++		 * If the file being reset has `skip-worktree` enabled, we need
     ++		 * to check it out to prevent the file from being hard reset.
      +		 */
     -+		if (core_apply_sparse_checkout && !file_exists(two->path)) {
     -+			pos = cache_name_pos(two->path, strlen(two->path));
     -+			if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) &&
     -+			    (is_missing || !was_missing)) {
     -+				state.force = 1;
     -+				state.refresh_cache = 1;
     -+				state.istate = &the_index;
     -+
     -+				ce_before = make_cache_entry(&the_index, two->mode,
     -+							     &two->oid, two->path,
     -+							     0, 0);
     -+				if (!ce_before)
     -+					die(_("make_cache_entry failed for path '%s'"),
     -+						two->path);
     ++		pos = cache_name_pos(two->path, strlen(two->path));
     ++		if (pos >= 0 && ce_skip_worktree(active_cache[pos])) {
     ++			struct checkout state = CHECKOUT_INIT;
     ++			state.force = 1;
     ++			state.refresh_cache = 1;
     ++			state.istate = &the_index;
      +
     -+				checkout_entry(ce_before, &state, NULL, NULL);
     -+			}
     ++			checkout_entry(active_cache[pos], &state, NULL, NULL);
      +		}
     - 
     - 		if (is_missing && !intent_to_add) {
     ++
     ++		if (!is_in_reset_tree && !intent_to_add) {
       			remove_file_from_cache(one->path);
     + 			continue;
     + 		}
     +@@ builtin/reset.c: static void update_index_from_diff(struct diff_queue_struct *q,
     + 		if (!ce)
     + 			die(_("make_cache_entry failed for path '%s'"),
     + 			    one->path);
     +-		if (is_missing) {
     ++		if (!is_in_reset_tree) {
     + 			ce->ce_flags |= CE_INTENT_TO_ADD;
     + 			set_object_name_for_intent_to_add_entry(ce);
     + 		}
      
       ## t/t1092-sparse-checkout-compatibility.sh ##
      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_failure 'blame with pathspec outside sparse definition' '
 2:  a1fa7c080ae ! 2:  f7cb9013d46 sparse-index: update command for expand/collapse test
     @@ Metadata
      Author: Victoria Dye <vdye@github.com>
      
       ## Commit message ##
     -    sparse-index: update command for expand/collapse test
     +    update-index: add --force-full-index option for expand/collapse test
      
     -    In anticipation of multiple commands being fully integrated with sparse
     -    index, update the test for index expand and collapse for non-sparse index
     -    integrated commands to use `mv`.
     +    Add a new `--force-full-index` option to `git update-index`, which skips
     +    explicitly setting `command_requires_full_index`. This lets `git
     +    update-index --force-full-index` run as a command without sparse index
     +    compatibility implemented, even after it receives sparse index compatibility
     +    updates.
     +
     +    By using `git update-index --force-full-index` in the `t1092` test
     +    `sparse-index is expanded and converted back`, commands can continue to
     +    integrate with the sparse index without the need to keep modifying the
     +    command used in the test.
      
          Signed-off-by: Victoria Dye <vdye@github.com>
      
     + ## Documentation/git-update-index.txt ##
     +@@ Documentation/git-update-index.txt: SYNOPSIS
     + 	     [--[no-]fsmonitor]
     + 	     [--really-refresh] [--unresolve] [--again | -g]
     + 	     [--info-only] [--index-info]
     ++	     [--force-full-index]
     + 	     [-z] [--stdin] [--index-version <n>]
     + 	     [--verbose]
     + 	     [--] [<file>...]
     +@@ Documentation/git-update-index.txt: time. Version 4 is relatively young (first released in 1.8.0 in
     + October 2012). Other Git implementations such as JGit and libgit2
     + may not support it yet.
     + 
     ++--force-full-index::
     ++	Force the command to operate on a full index, expanding a sparse
     ++	index if necessary.
     ++
     + -z::
     + 	Only meaningful with `--stdin` or `--index-info`; paths are
     + 	separated with NUL character instead of LF.
     +
     + ## builtin/update-index.c ##
     +@@ builtin/update-index.c: int cmd_update_index(int argc, const char **argv, const char *prefix)
     + 	int split_index = -1;
     + 	int force_write = 0;
     + 	int fsmonitor = -1;
     ++	int use_default_full_index = 0;
     + 	struct lock_file lock_file = LOCK_INIT;
     + 	struct parse_opt_ctx_t ctx;
     + 	strbuf_getline_fn getline_fn;
     +@@ builtin/update-index.c: int cmd_update_index(int argc, const char **argv, const char *prefix)
     + 		{OPTION_SET_INT, 0, "no-fsmonitor-valid", &mark_fsmonitor_only, NULL,
     + 			N_("clear fsmonitor valid bit"),
     + 			PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL, UNMARK_FLAG},
     ++		OPT_SET_INT(0, "force-full-index", &use_default_full_index,
     ++			N_("run with full index explicitly required"), 1),
     + 		OPT_END()
     + 	};
     + 
     +@@ builtin/update-index.c: int cmd_update_index(int argc, const char **argv, const char *prefix)
     + 	if (newfd < 0)
     + 		lock_error = errno;
     + 
     ++	/*
     ++	 * If --force-full-index is set, the command should skip manually
     ++	 * setting `command_requires_full_index`.
     ++	 */
     ++	prepare_repo_settings(r);
     ++	if (!use_default_full_index)
     ++		r->settings.command_requires_full_index = 1;
     ++
     + 	entries = read_cache();
     + 	if (entries < 0)
     + 		die("cache corrupted");
     +
       ## t/t1092-sparse-checkout-compatibility.sh ##
      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is expanded and converted back' '
       	init_repos &&
       
       	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
      -		git -C sparse-index -c core.fsmonitor="" reset --hard &&
     -+		git -C sparse-index -c core.fsmonitor="" mv a b &&
     ++		git -C sparse-index -c core.fsmonitor="" update-index --force-full-index &&
       	test_region index convert_to_sparse trace2.txt &&
       	test_region index ensure_full_index trace2.txt
       '
 3:  d033c5e365f = 3:  c7e9d9f4e03 reset: expand test coverage for sparse checkouts
 4:  2d63a250637 = 4:  49813c8d9ed reset: integrate with sparse index
 5:  e919e6d3270 = 5:  78cd85d8dcc reset: make sparse-aware (except --mixed)
 6:  e7cda32efb6 ! 6:  5eaae0825af reset: make --mixed sparse-aware
     @@ builtin/reset.c: static int read_from_tree(const struct pathspec *pathspec,
      
       ## t/t1092-sparse-checkout-compatibility.sh ##
      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is not expanded' '
     - 	for ref in update-deep update-folder1 update-folder2 update-deep
     - 	do
     - 		echo >>sparse-index/README.md &&
     -+		ensure_not_expanded reset --mixed $ref
       		ensure_not_expanded reset --hard $ref || return 1
       	done &&
       
     ++	ensure_not_expanded reset --mixed base &&
       	ensure_not_expanded reset --hard update-deep &&
       	ensure_not_expanded reset --keep base &&
       	ensure_not_expanded reset --merge update-deep &&
 7:  8637ec1660e = 7:  aa963eefae7 unpack-trees: improve performance of next_cache_entry

Comments

Ævar Arnfjörð Bjarmason Oct. 5, 2021, 3:34 p.m. UTC | #1
On Tue, Oct 05 2021, Victoria Dye via GitGitGadget wrote:

> The p2000 tests demonstrate an overall ~70% execution time reduction across
> all tested usages of git reset using a sparse index:

[...]

> Test                                               before   after       
> ------------------------------------------------------------------------
> 2000.22: git reset (full-v3)                       0.48     0.51 +6.3% 
> 2000.23: git reset (full-v4)                       0.47     0.50 +6.4% 
> 2000.24: git reset (sparse-v3)                     0.93     0.30 -67.7%
> 2000.25: git reset (sparse-v4)                     0.94     0.29 -69.1%
> 2000.26: git reset --hard (full-v3)                0.69     0.68 -1.4% 
> 2000.27: git reset --hard (full-v4)                0.75     0.68 -9.3% 
> 2000.28: git reset --hard (sparse-v3)              1.29     0.34 -73.6%
> 2000.29: git reset --hard (sparse-v4)              1.31     0.34 -74.0%
> 2000.30: git reset -- does-not-exist (full-v3)     0.54     0.51 -5.6% 
> 2000.31: git reset -- does-not-exist (full-v4)     0.54     0.52 -3.7% 
> 2000.32: git reset -- does-not-exist (sparse-v3)   1.02     0.31 -69.6%
> 2000.33: git reset -- does-not-exist (sparse-v4)   1.07     0.30 -72.0%

This series looks like it really improves some cases, but at the cost of
that -70% improvement we've got a ~5% regression in 7/7 for the full-v3
--does-not-exist cases. As noted in your 7/7 (which improves all other
cases):

    (full-v3)     0.79(0.38+0.30)   0.91(0.43+0.34) +15.2%
    (full-v4)     0.80(0.38+0.29)   0.85(0.40+0.35) +6.2%

Which b.t.w. I had to read a couple of times before realizig that its
quoted:
    
    Test          before            after
    ------------------------------------------------------
    (full-v3)     0.79(0.38+0.30)   0.91(0.43+0.34) +15.2%
    (full-v4)     0.80(0.38+0.29)   0.85(0.40+0.35) +6.2%
    (sparse-v3)   0.76(0.43+0.69)   0.44(0.08+0.67) -42.1%
    (sparse-v4)   0.71(0.40+0.65)   0.41(0.09+0.65) -42.3%

Is just the does-not-exist part of this bigger table, are the other
cases all ~0% changed, or ...?
    
Anyway, until 7/7 the v3 had been sped up, but a ~10% increase landed us
at ~+6%, and full-v4 had been ~0% but got ~6% worse?

Is there a way we can get those improvements in performance without
regressing on the full-* cases?

Also, these tests only check sparse performance, but isn't some of the
code being modified here general enough to not be used exclusively by
the sparse mode, full checkout cone or not?

It looks fairly easy to extend p2000-sparse-operations.sh to run the
same tests but just pretend that it's running in a "full" mode without
actually setting up anyting sparse-specific (the meat of those tests
just runs "git status" etc. How does that look with this series?

Since only the CL and 7/7 quote numbers from p2000, and 7/7 is at least
a partial regression, it would be nice to have perf numbers on each
commit (if only as a one-off for ML consumption). Are there any more
improvements followed by regressions followed by improvements as we go
along? Would be useful to know...
Victoria Dye Oct. 5, 2021, 8:44 p.m. UTC | #2
Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Oct 05 2021, Victoria Dye via GitGitGadget wrote:
> 
>> The p2000 tests demonstrate an overall ~70% execution time reduction across
>> all tested usages of git reset using a sparse index:
> 
> [...]
> 
>> Test                                               before   after       
>> ------------------------------------------------------------------------
>> 2000.22: git reset (full-v3)                       0.48     0.51 +6.3% 
>> 2000.23: git reset (full-v4)                       0.47     0.50 +6.4% 
>> 2000.24: git reset (sparse-v3)                     0.93     0.30 -67.7%
>> 2000.25: git reset (sparse-v4)                     0.94     0.29 -69.1%
>> 2000.26: git reset --hard (full-v3)                0.69     0.68 -1.4% 
>> 2000.27: git reset --hard (full-v4)                0.75     0.68 -9.3% 
>> 2000.28: git reset --hard (sparse-v3)              1.29     0.34 -73.6%
>> 2000.29: git reset --hard (sparse-v4)              1.31     0.34 -74.0%
>> 2000.30: git reset -- does-not-exist (full-v3)     0.54     0.51 -5.6% 
>> 2000.31: git reset -- does-not-exist (full-v4)     0.54     0.52 -3.7% 
>> 2000.32: git reset -- does-not-exist (sparse-v3)   1.02     0.31 -69.6%
>> 2000.33: git reset -- does-not-exist (sparse-v4)   1.07     0.30 -72.0%
> 
> This series looks like it really improves some cases, but at the cost of
> that -70% improvement we've got a ~5% regression in 7/7 for the full-v3
> --does-not-exist cases. As noted in your 7/7 (which improves all other
> cases):
> 
>     (full-v3)     0.79(0.38+0.30)   0.91(0.43+0.34) +15.2%
>     (full-v4)     0.80(0.38+0.29)   0.85(0.40+0.35) +6.2%
> 

New performance numbers at the end - I think I have an explanation for this.

> Which b.t.w. I had to read a couple of times before realizig that its
> quoted:
>     
>     Test          before            after
>     ------------------------------------------------------
>     (full-v3)     0.79(0.38+0.30)   0.91(0.43+0.34) +15.2%
>     (full-v4)     0.80(0.38+0.29)   0.85(0.40+0.35) +6.2%
>     (sparse-v3)   0.76(0.43+0.69)   0.44(0.08+0.67) -42.1%
>     (sparse-v4)   0.71(0.40+0.65)   0.41(0.09+0.65) -42.3%
> 
> Is just the does-not-exist part of this bigger table, are the other
> cases all ~0% changed, or ...?
>     

These numbers were for the `git reset -- does-not-exist` case only. If I end
up needing to send a V3, though, I'll probably remove the performance
numbers from 7/7 altogether - looking at them now, they make the commit
message somewhat cluttered. That said, performance numbers *are* helpful for
reviews on the mailing list, so I'd keep the information in the cover
letter at the very least.

> Anyway, until 7/7 the v3 had been sped up, but a ~10% increase landed us
> at ~+6%, and full-v4 had been ~0% but got ~6% worse?
> 
> Is there a way we can get those improvements in performance without
> regressing on the full-* cases?
> 
> Also, these tests only check sparse performance, but isn't some of the
> code being modified here general enough to not be used exclusively by
> the sparse mode, full checkout cone or not?
> 
> It looks fairly easy to extend p2000-sparse-operations.sh to run the
> same tests but just pretend that it's running in a "full" mode without
> actually setting up anyting sparse-specific (the meat of those tests
> just runs "git status" etc. How does that look with this series?
> 

I updated `p2000` locally to do this but the setup was substantially slower
for the full checkout, to the point that it was infeasible to run the
complete test for all relevant commits. Looking at the changes in this
series, nothing appears to affect the full checkout case differently than
the sparse checkout/full index case, so I'm fairly confident there won't be
a regression specific to full checkouts.

> Since only the CL and 7/7 quote numbers from p2000, and 7/7 is at least
> a partial regression, it would be nice to have perf numbers on each
> commit (if only as a one-off for ML consumption). Are there any more
> improvements followed by regressions followed by improvements as we go
> along? Would be useful to know...
> 

I don't think any of the apparent slowdowns seen in these results represent
real regressions. After re-running the performance tests, I saw variability
of up to ~20% execution time across changes with commands that should see no
effect on their execution time (e.g. sparse-v* from 1/7 to 4/7).
Additionally, I saw different increases & decreases each time for each
end-to-end run of the tests. The most reliable, noticeable changes across
the test executions were:

1. When each variant of `git reset` was integrated with sparse index, a
   65-75% execution time reduction in relevant sparse-v* tests.
2. `git reset -- does-not-exist` slower than `git reset` in 6/7,
   then matching its speed after 7/7.
3. As of 7/7, full-v* to sparse-v* showing a 50% execution time reduction.

My guess is that the variability comes from general "uncontrolled" factors
when running the tests (e.g., background processes on my system). The good
news is, when the tests are re-run with more trials (and the recent bugfix
to `t/perf/perf-lib.sh` [1]), the execution times look a lot less worrisome 
(apologies for the table width, but I'd like to err on the side of providing
more complete information):

Test                                               base              [1/7]                    [4/7]                    [5/7]                    [6/7]                    [7/7]            
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2000.22: git reset (full-v3)                       0.44(0.16+0.19)   0.44(0.17+0.18) +0.0%    0.44(0.17+0.19) +0.0%    0.45(0.17+0.18) +2.3%    0.44(0.17+0.19) +0.0%    0.45(0.17+0.18) +2.3% 
2000.23: git reset (full-v4)                       0.43(0.16+0.18)   0.43(0.16+0.19) +0.0%    0.45(0.17+0.18) +4.7%    0.44(0.17+0.18) +2.3%    0.44(0.17+0.18) +2.3%    0.44(0.18+0.18) +2.3% 
2000.24: git reset (sparse-v3)                     0.82(0.54+0.19)   0.84(0.56+0.19) +2.4%    0.81(0.54+0.19) -1.2%    0.88(0.60+0.19) +7.3%    0.27(0.03+0.45) -67.1%   0.27(0.03+0.47) -67.1%
2000.25: git reset (sparse-v4)                     0.82(0.55+0.18)   0.82(0.53+0.20) +0.0%    0.83(0.55+0.19) +1.2%    0.82(0.54+0.19) +0.0%    0.27(0.03+0.50) -67.1%   0.27(0.03+0.48) -67.1%
2000.26: git reset --hard (full-v3)                0.71(0.38+0.24)   0.69(0.37+0.23) -2.8%    0.70(0.37+0.24) -1.4%    0.78(0.41+0.27) +9.9%    0.71(0.38+0.25) +0.0%    0.70(0.37+0.23) -1.4% 
2000.27: git reset --hard (full-v4)                0.71(0.38+0.23)   0.77(0.42+0.25) +8.5%    0.76(0.41+0.26) +7.0%    0.72(0.40+0.24) +1.4%    0.68(0.37+0.23) -4.2%    0.67(0.36+0.22) -5.6%
2000.28: git reset --hard (sparse-v3)              1.29(0.93+0.26)   1.33(0.95+0.27) +3.1%    1.11(0.76+0.25) -14.0%   0.38(0.05+0.25) -70.5%   0.36(0.04+0.22) -72.1%   0.34(0.04+0.21) -73.6%
2000.29: git reset --hard (sparse-v4)              1.17(0.84+0.24)   1.10(0.79+0.23) -6.0%    1.01(0.69+0.24) -13.7%   0.42(0.05+0.26) -64.1%   0.39(0.05+0.25) -66.7%   0.38(0.05+0.23) -67.5%
2000.30: git reset -- does-not-exist (full-v3)     0.50(0.19+0.20)   0.50(0.19+0.20) +0.0%    0.53(0.21+0.22) +6.0%    0.47(0.18+0.19) -6.0%    0.45(0.18+0.18) -10.0%   0.45(0.18+0.19) -10.0%
2000.31: git reset -- does-not-exist (full-v4)     0.45(0.18+0.18)   0.46(0.18+0.19) +2.2%    0.47(0.19+0.19) +4.4%    0.45(0.18+0.19) +0.0%    0.45(0.18+0.18) +0.0%    0.45(0.18+0.18) +0.0% 
2000.32: git reset -- does-not-exist (sparse-v3)   1.01(0.70+0.21)   0.91(0.62+0.20) -9.9%    0.93(0.64+0.20) -7.9%    0.89(0.61+0.20) -11.9%   0.48(0.23+0.46) -52.5%   0.27(0.03+0.49) -73.3%
2000.33: git reset -- does-not-exist (sparse-v4)   0.99(0.67+0.21)   1.02(0.70+0.22) +3.0%    1.04(0.70+0.22) +5.1%    0.83(0.55+0.19) -16.2%   0.48(0.24+0.48) -51.5%   0.27(0.03+0.49) -72.7%

Note that some commits in this series are not included because they don't
touch any code used by `git reset`.

[1] https://lore.kernel.org/git/pull.1051.git.1633386543759.gitgitgadget@gmail.com/
Elijah Newren Oct. 6, 2021, 5:46 a.m. UTC | #3
Hi Victoria,

On Tue, Oct 5, 2021 at 6:20 AM Victoria Dye via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This series integrates the sparse index with git reset and provides
> miscellaneous fixes and improvements to the command in sparse checkouts.
> This includes:
>
>  1. tests added to t1092 and p2000 to establish the baseline functionality
>     of the command
>  2. repository settings to enable the sparse index with ensure_full_index
>     guarding any code paths that break tests without other compatibility
>     updates.
>  3. modifications to remove or reduce the scope in which ensure_full_index
>     must be called.
>
> The sparse index updates are predicated on a fix originating from the
> microsoft/git fork [1], correcting how git reset --mixed handles resetting
> entries outside the sparse checkout definition. Additionally, a performance
> "bug" in next_cache_entry with sparse index is corrected, preventing
> repeatedly looping over already-searched entries.
>
> The p2000 tests demonstrate an overall ~70% execution time reduction across
> all tested usages of git reset using a sparse index:
>
> Test                                               before   after
> ------------------------------------------------------------------------
> 2000.22: git reset (full-v3)                       0.48     0.51 +6.3%
> 2000.23: git reset (full-v4)                       0.47     0.50 +6.4%
> 2000.24: git reset (sparse-v3)                     0.93     0.30 -67.7%
> 2000.25: git reset (sparse-v4)                     0.94     0.29 -69.1%
> 2000.26: git reset --hard (full-v3)                0.69     0.68 -1.4%
> 2000.27: git reset --hard (full-v4)                0.75     0.68 -9.3%
> 2000.28: git reset --hard (sparse-v3)              1.29     0.34 -73.6%
> 2000.29: git reset --hard (sparse-v4)              1.31     0.34 -74.0%
> 2000.30: git reset -- does-not-exist (full-v3)     0.54     0.51 -5.6%
> 2000.31: git reset -- does-not-exist (full-v4)     0.54     0.52 -3.7%
> 2000.32: git reset -- does-not-exist (sparse-v3)   1.02     0.31 -69.6%
> 2000.33: git reset -- does-not-exist (sparse-v4)   1.07     0.30 -72.0%
>
>
>
> Changes since V1
> ================
>
>  * Add --force-full-index option to update-index. The option is used
>    circumvent changing command_requires_full_index from its default value -
>    right now this is effectively a no-op, but will change once update-index
>    is integrated with sparse index. By using this option in the t1092
>    expand/collapse test, the command used to test will not need to be
>    updated with subsequent sparse index integrations.
>  * Update implementation of mixed reset for entries outside sparse checkout
>    definition. The condition in which a file should be checked out before
>    index reset is simplified to "if it has skip-worktree enabled and a reset
>    would change the file, check it out".
>    * After checking the behavior of update_index_from_diff with renames,
>      found that the diff used by reset does not produce diff queue entries
>      with different pathnames for one and two. Because of this, and that
>      nothing in the implementation seems to rely on identical path names, no
>      BUG check is added.
>  * Correct a bug in the sparse index is not expanded tests in t1092 where
>    failure of a git reset --mixed test was not being reported. Test now
>    verifies an appropriate scenario with corrected failure-checking.

I read over the first six patches.  I tried to read over the seventh,
but I've never figured out cache_bottom for some reason and I did
nothing beyond spot checking when Stolee touched that area either.

Anyway, I had lots of little comments, tweaks to the way to fix the
inconsistency in patch 1, various questions, etc.  It probably adds up
to a lot, but it's all small fixable stuff; overall it looks like you
(and Kevin) are making a solid contribution on the sparse-checkout
stuff; I look forward to reading the next round.