mbox series

[v3,00/12] Sparse-index: integrate with status

Message ID pull.932.v3.git.1621017072.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Sparse-index: integrate with status | expand

Message

Philippe Blain via GitGitGadget May 14, 2021, 6:30 p.m. UTC
This is the first "payoff" series in the sparse-index work. It makes 'git
status' very fast when a sparse-index is enabled on a repository with
cone-mode sparse-checkout (and a small populated set).

This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.
The latter branch is needed because it changes the behavior of 'git add'
around sparse entries, which changes the expectations of a test added in
patch 1.

The approach here is to audit the places where ensure_full_index() pops up
while doing normal commands with pathspecs within the sparse-checkout
definition. Each of these are checked and tested. In the end, the
sparse-index is integrated with these features:

 * git status
 * FS Monitor index extension.

The performance tests in p2000-sparse-operations.sh improve by 95% or more,
even when compared with the full-index cases, not just the sparse-index
cases that previously had extra overhead.

Hopefully this is the first example of how ds/sparse-index-protections has
done the basic work to do these conversions safely, making them look easier
than they seemed when starting this adventure.

Thanks, -Stolee


Updates in V3
=============

Sorry that this was a long time coming. I got a little side-tracked on other
projects, but I also worked to get the sparse-index feature working against
the Scalar functional tests, which contain many special cases around the
sparse-checkout feature as they were inherited from special cases that arose
in the virtualized environment of VFS for Git. This version contains my
fixes based on that investigation. Most of these were easy to identify and
fix, but I was blocked for a long time struggling with a bug when combining
the sparse-index with the builtin FS Monitor feature, but I've reported my
findings already [1].

[1]
https://lore.kernel.org/git/0b9e54ba-ac27-e537-7bef-1b4448f92352@gmail.com/

 * Updated comments and tests based on the v2 feedback.
 * Expanded the test repository data shape based on the special cases found
   during my investigation.
 * Added several commits that either fix errors in the status code, or fix
   errors in the previous sparse-index series, specifically:
   * When in a conflict state, the cache-tree fails to update. For now, skip
     writing a sparse-index until this can be resolved more carefully.
   * When expanding a sparse-directory entry, we set the CE_SKIP_WORKTREE
     bit but forgot the CE_EXTENDED bit.
   * git status had failures if there was a sparse-directory entry as the
     first entry within a directory.
   * When expanding a directory to report its status, such as when a
     sparse-directory is staged but doesn't exist at HEAD (such as in an
     orphaned commit) we did not previously recurse correctly into
     subdirectories.
   * Be extra careful with the FS Monitor data when expanding or contracting
     an index. This version now abandons all FS Monitor data at these
     conversion points with the expectation that in the future these
     conversions will be rare so the FS Monitor feature can work
     efficiently. Updates in V2

----------------------------------------------------------------------------

 * Based on the feedback, it is clear that 'git add' will require much more
   careful testing and thought. I'm splitting it out of this series and it
   will return with a follow-up.
 * Test cases are improved, both in coverage and organization.
 * The previous "unpack-trees: make sparse aware" patch is split into three
   now.
 * Stale messages based on an old implementation of the "protections" topic
   are now fixed.
 * Performance tests were re-run.

Derrick Stolee (12):
  sparse-index: skip indexes with unmerged entries
  sparse-index: include EXTENDED flag when expanding
  t1092: expand repository data shape
  t1092: add tests for status/add and sparse files
  unpack-trees: preserve cache_bottom
  unpack-trees: compare sparse directories correctly
  unpack-trees: stop recursing into sparse directories
  dir.c: accept a directory as part of cone-mode patterns
  status: skip sparse-checkout percentage with sparse-index
  status: use sparse-index throughout
  wt-status: expand added sparse directory entries
  fsmonitor: integrate with sparse index

 builtin/commit.c                         |   3 +
 diff-lib.c                               |   6 ++
 dir.c                                    |  11 +++
 read-cache.c                             |  10 +-
 sparse-index.c                           |  27 +++++-
 t/t1092-sparse-checkout-compatibility.sh | 117 ++++++++++++++++++++++-
 t/t7519-status-fsmonitor.sh              |  48 ++++++++++
 unpack-trees.c                           |  27 +++++-
 wt-status.c                              |  64 ++++++++++++-
 wt-status.h                              |   1 +
 10 files changed, 300 insertions(+), 14 deletions(-)


base-commit: f723f370c89ad61f4f40aabfd3540b1ce19c00e5
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-932%2Fderrickstolee%2Fsparse-index%2Fstatus-and-add-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-932/derrickstolee/sparse-index/status-and-add-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/932

Range-diff vs v2:

  -:  ------------ >  1:  5a2ed3d1d701 sparse-index: skip indexes with unmerged entries
  -:  ------------ >  2:  8aa41e749471 sparse-index: include EXTENDED flag when expanding
  -:  ------------ >  3:  70971b1f9261 t1092: expand repository data shape
  1:  3bac9edae7d8 !  4:  a80b5a41153f t1092: add tests for status/add and sparse files
     @@ Commit message
          and files outside of the sparse cone.
      
          Specifically, 'folder1/a' is a file in our test repo, but 'folder1' is
     -    not in the sparse cone. When 'folder1/a' is modified, the file
     -    'folder1/a' is shown as modified, but adding it fails. This is new
     -    behavior as of a20f704 (add: warn when asked to update SKIP_WORKTREE
     -    entries, 2021-04-08). Before that change, these adds would be silently
     -    ignored.
     +    not in the sparse cone. When 'folder1/a' is modified, the file is not
     +    shown as modified and adding it will fail. This is new behavior as of
     +    a20f704 (add: warn when asked to update SKIP_WORKTREE entries,
     +    2021-04-08). Before that change, these adds would be silently ignored.
      
          Untracked files are fine: adding new files both with 'git add .' and
          'git add folder1/' works just as in a full checkout. This may not be
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'add, commit, chec
      +test_expect_success 'status/add: outside sparse cone' '
      +	init_repos &&
      +
     ++	# adding a "missing" file outside the cone should fail
     ++	test_sparse_match test_must_fail git add folder1/a &&
     ++
      +	# folder1 is at HEAD, but outside the sparse cone
      +	run_on_sparse mkdir folder1 &&
      +	cp initial-repo/folder1/a sparse-checkout/folder1/a &&
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'add, commit, chec
      +	write_script edit-contents <<-\EOF &&
      +	echo text >>$1
      +	EOF
     -+	run_on_all ../edit-contents folder1/a &&
     ++	run_on_sparse ../edit-contents folder1/a &&
      +	run_on_all ../edit-contents folder1/new &&
      +
      +	test_sparse_match git status --porcelain=v2 &&
      +
     -+	# This "git add folder1/a" is completely ignored
     -+	# by the sparse-checkout repos. It causes the
     -+	# full repo to have a different staged environment.
     -+	#
     -+	# This is not a desirable behavior, but this test
     -+	# ensures that the sparse-index is not the cause
     -+	# of a behavior change.
     ++	# This "git add folder1/a" fails with a warning
     ++	# in the sparse repos, differing from the full
     ++	# repo. This is intentional.
      +	test_sparse_match test_must_fail git add folder1/a &&
      +	test_sparse_match test_must_fail git add --refresh folder1/a &&
     -+	git -C full-checkout checkout HEAD -- folder1/a &&
      +	test_all_match git status --porcelain=v2 &&
      +
      +	test_all_match git add . &&
  2:  19344394379d =  5:  07a45b661c4a unpack-trees: preserve cache_bottom
  3:  24e71d8c0622 !  6:  cc4a526e7947 unpack-trees: compare sparse directories correctly
     @@ unpack-trees.c: static int compare_entry(const struct cache_entry *ce, const str
       	if (cmp)
       		return cmp;
       
     -+	/* If ce is a sparse directory, then allow an exact match. */
     ++	/*
     ++	 * At this point, we know that we have a prefix match. If ce
     ++	 * is a sparse directory, then allow an exact match. This only
     ++	 * works when the input name is a directory, since ce->name
     ++	 * ends in a directory separator.
     ++	 */
      +	if (S_ISSPARSEDIR(ce->ce_mode))
      +		return 0;
      +
  4:  d3c8948d0a33 !  7:  598375d3531f unpack-trees: stop recursing into sparse directories
     @@ Commit message
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     + ## diff-lib.c ##
     +@@ diff-lib.c: static void show_new_file(struct rev_info *revs,
     + 	unsigned int mode;
     + 	unsigned dirty_submodule = 0;
     + 
     ++	if (S_ISSPARSEDIR(new_file->ce_mode))
     ++		return;
     ++
     + 	/*
     + 	 * New file in the index: it might actually be different in
     + 	 * the working tree.
     +@@ diff-lib.c: static int show_modified(struct rev_info *revs,
     + 	const struct object_id *oid;
     + 	unsigned dirty_submodule = 0;
     + 
     ++	if (S_ISSPARSEDIR(new_entry->ce_mode))
     ++		return 0;
     ++
     + 	if (get_stat_data(new_entry, &oid, &mode, cached, match_missing,
     + 			  &dirty_submodule, &revs->diffopt) < 0) {
     + 		if (report_missing)
     +
       ## unpack-trees.c ##
      @@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
       	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
     @@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned l
       	/* Find first entry with a real name (we could use "mask" too) */
       	while (!p->mode)
      @@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
     - 					}
     - 				}
     - 				src[0] = ce;
     -+
     -+				if (S_ISSPARSEDIR(ce->ce_mode))
     -+					unpack_tree = 0;
     - 			}
     - 			break;
       		}
       	}
       
  5:  fd96b71968b6 =  8:  47da2b317237 dir.c: accept a directory as part of cone-mode patterns
  6:  1f4ba56e7416 =  9:  bc1512981493 status: skip sparse-checkout percentage with sparse-index
  7:  3d09368c0541 = 10:  5b1ae369a7cd status: use sparse-index throughout
  -:  ------------ > 11:  3b42783d4a86 wt-status: expand added sparse directory entries
  8:  1fd033a6ebb2 ! 12:  b72507f514d1 fsmonitor: test with sparse index
     @@ Metadata
      Author: Derrick Stolee <dstolee@microsoft.com>
      
       ## Commit message ##
     -    fsmonitor: test with sparse index
     +    fsmonitor: integrate with sparse index
      
     -    During the effort to protect uses of the index to operate on a full
     -    index, we did not modify fsmonitor.c. This is because it already works
     -    effectively with only the change to index_name_stage_pos(). The only
     -    thing left to do is to test that it works correctly.
     +    If we need to expand a sparse-index into a full one, then the FS Monitor
     +    bitmap is going to be incorrect. Ensure that we start fresh at such an
     +    event.
     +
     +    While this is currently a performance drawback, the eventual hope of the
     +    sparse-index feature is that these expansions will be rare and hence we
     +    will be able to keep the FS Monitor data accurate across multiple Git
     +    commands.
      
          These tests are added to demonstrate that the behavior is the same
          across a full index and a sparse index, but also that file modifications
     @@ Commit message
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     + ## sparse-index.c ##
     +@@ sparse-index.c: int convert_to_sparse(struct index_state *istate)
     + 	cache_tree_free(&istate->cache_tree);
     + 	cache_tree_update(istate, 0);
     + 
     ++	istate->fsmonitor_has_run_once = 0;
     ++	FREE_AND_NULL(istate->fsmonitor_dirty);
     ++	FREE_AND_NULL(istate->fsmonitor_last_update);
     ++
     + 	istate->sparse_index = 1;
     + 	trace2_region_leave("index", "convert_to_sparse", istate->repo);
     + 	return 0;
     +@@ sparse-index.c: void ensure_full_index(struct index_state *istate)
     + 	istate->cache = full->cache;
     + 	istate->cache_nr = full->cache_nr;
     + 	istate->cache_alloc = full->cache_alloc;
     ++	istate->fsmonitor_has_run_once = 0;
     ++	FREE_AND_NULL(istate->fsmonitor_dirty);
     ++	FREE_AND_NULL(istate->fsmonitor_last_update);
     + 
     + 	strbuf_release(&base);
     + 	free(full);
     +
       ## t/t7519-status-fsmonitor.sh ##
      @@ t/t7519-status-fsmonitor.sh: test_expect_success 'setup' '
       	expect*