mbox series

[v3,00/20] Sparse Index: Design, Format, Tests

Message ID pull.883.v3.git.1615912983.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Sparse Index: Design, Format, Tests | expand

Message

Phillip Wood via GitGitGadget March 16, 2021, 4:42 p.m. UTC
Here is the first full patch series submission coming out of the
sparse-index RFC [1].

[1]
https://lore.kernel.org/git/pull.847.git.1611596533.gitgitgadget@gmail.com/

I won't waste too much space here, because PATCH 1 includes a sizeable
design document that describes the feature, the reasoning behind it, and my
plan for getting this implemented widely throughout the codebase.

There are some new things here that were not in the RFC:

 * Design doc and format updates. (Patch 1)
 * Performance test script. (Patches 2 and 20)

Notably missing in this series from the RFC:

 * The mega-patch inserting ensure_full_index() throughout the codebase.
   That will be a follow-up series to this one.
 * The integrations with git status and git add to demonstrate the improved
   performance. Those will also appear in their own series later.

I plan to keep my latest work in this area in my 'sparse-index/wip' branch
[2]. It includes all of the work from the RFC right now, updated with the
work from this series.

[2] https://github.com/derrickstolee/git/tree/sparse-index/wip


Updates in V3
=============

For this version, I took Ævar's latest patches and applied them to v2.31.0
and rebased this series on top. It uses his new "read_tree_at()" helper and
the associated changes to the function pointer type.

 * Fixed more typos. Thanks Martin and Elijah!
 * Updated the test_sparse_match() macro to use "$@" instead of $*
 * Added a test that git sparse-checkout init --no-sparse-index rewrites the
   index to be full.


Updates in V2
=============

 * Various typos and awkward grammar is fixed.
 * Cleaned up unnecessary commands in p2000-sparse-operations.sh
 * Added a comment to the sparse_index member of struct index_state.
 * Used tree_type, commit_type, and blob_type in test-read-cache.c.

Thanks, -Stolee

Derrick Stolee (20):
  sparse-index: design doc and format update
  t/perf: add performance test for sparse operations
  t1092: clean up script quoting
  sparse-index: add guard to ensure full index
  sparse-index: implement ensure_full_index()
  t1092: compare sparse-checkout to sparse-index
  test-read-cache: print cache entries with --table
  test-tool: don't force full index
  unpack-trees: ensure full index
  sparse-checkout: hold pattern list in index
  sparse-index: convert from full to sparse
  submodule: sparse-index should not collapse links
  unpack-trees: allow sparse directories
  sparse-index: check index conversion happens
  sparse-index: create extension for compatibility
  sparse-checkout: toggle sparse index from builtin
  sparse-checkout: disable sparse-index
  cache-tree: integrate with sparse directory entries
  sparse-index: loose integration with cache_tree_verify()
  p2000: add sparse-index repos

 Documentation/config/extensions.txt      |   8 +
 Documentation/git-sparse-checkout.txt    |  14 ++
 Documentation/technical/index-format.txt |   7 +
 Documentation/technical/sparse-index.txt | 173 +++++++++++++
 Makefile                                 |   1 +
 builtin/sparse-checkout.c                |  44 +++-
 cache-tree.c                             |  40 ++++
 cache.h                                  |  18 +-
 read-cache.c                             |  35 ++-
 repo-settings.c                          |  15 ++
 repository.c                             |  11 +-
 repository.h                             |   3 +
 setup.c                                  |   3 +
 sparse-index.c                           | 293 +++++++++++++++++++++++
 sparse-index.h                           |  11 +
 t/README                                 |   3 +
 t/helper/test-read-cache.c               |  66 ++++-
 t/perf/p2000-sparse-operations.sh        | 102 ++++++++
 t/t1091-sparse-checkout-builtin.sh       |  13 +
 t/t1092-sparse-checkout-compatibility.sh | 143 +++++++++--
 unpack-trees.c                           |  16 +-
 21 files changed, 979 insertions(+), 40 deletions(-)
 create mode 100644 Documentation/technical/sparse-index.txt
 create mode 100644 sparse-index.c
 create mode 100644 sparse-index.h
 create mode 100755 t/perf/p2000-sparse-operations.sh


base-commit: 9c34e7ffd7b544199d889e2f3f7d9ba663c4357d
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-883%2Fderrickstolee%2Fsparse-index%2Fformat-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-883/derrickstolee/sparse-index/format-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/883

Range-diff vs v2:

  1:  2fe413fdac80 !  1:  62ac13945bec sparse-index: design doc and format update
     @@ Documentation/technical/sparse-index.txt (new)
      +Hopefully, commands such as `git merge` and `git rebase` can benefit
      +instead from merge algorithms that do not use the index as a data
      +structure, such as the merge-ORT strategy. As these topics mature, we
     -+may enalbe the ORT strategy by default for repositories using the
     ++may enable the ORT strategy by default for repositories using the
      +sparse-index feature.
      +
      +Along with `git status` and `git add`, these commands cover the majority
  2:  540ab5495065 =  2:  d2197e895e4d t/perf: add performance test for sparse operations
  3:  5cbedb377b37 =  3:  d3cfd34b8418 t1092: clean up script quoting
  4:  6e21f776e883 =  4:  4472118cf903 sparse-index: add guard to ensure full index
  5:  399ddb0bad56 !  5:  99292cdbaae4 sparse-index: implement ensure_full_index()
     @@ sparse-index.c
      +}
      +
      +static int add_path_to_index(const struct object_id *oid,
     -+				struct strbuf *base, const char *path,
     -+				unsigned int mode, int stage, void *context)
     ++			     struct strbuf *base, const char *path,
     ++			     unsigned int mode, void *context)
      +{
      +	struct index_state *istate = (struct index_state *)context;
      +	struct cache_entry *ce;
     @@ sparse-index.c
      -	/* intentionally left blank */
      +	int i;
      +	struct index_state *full;
     ++	struct strbuf base = STRBUF_INIT;
      +
      +	if (!istate || !istate->sparse_index)
      +		return;
     @@ sparse-index.c
      +		ps.has_wildcard = 1;
      +		ps.max_depth = -1;
      +
     -+		read_tree_recursive(istate->repo, tree,
     -+				    ce->name, strlen(ce->name),
     -+				    0, &ps,
     -+				    add_path_to_index, full);
     ++		strbuf_setlen(&base, 0);
     ++		strbuf_add(&base, ce->name, strlen(ce->name));
     ++
     ++		read_tree_at(istate->repo, tree, &base, &ps,
     ++			     add_path_to_index, full);
      +
      +		/* free directory entries. full entries are re-used */
      +		discard_cache_entry(ce);
     @@ sparse-index.c
      +	istate->cache_nr = full->cache_nr;
      +	istate->cache_alloc = full->cache_alloc;
      +
     ++	strbuf_release(&base);
      +	free(full);
      +
      +	trace2_region_leave("index", "ensure_full_index", istate->repo);
  6:  eac2db5efc22 !  6:  fae5663a17bb t1092: compare sparse-checkout to sparse-index
     @@ Commit message
          add run_on_sparse and test_sparse_match helpers. These helpers will be
          used when the sparse index is implemented.
      
     -    Add GIT_TEST_SPARSE_INDEX environment variable to enable the
     -    sparse-index by default. This will be intended to use across the entire
     -    test suite, except that it will only affect cases where the
     -    sparse-checkout feature is enabled.
     +    Add the GIT_TEST_SPARSE_INDEX environment variable to enable the
     +    sparse-index by default. This can be enabled across all tests, but that
     +    will only affect cases where the sparse-checkout feature is enabled.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     @@ t/t1092-sparse-checkout-compatibility.sh: test_all_match () {
       }
       
      +test_sparse_match () {
     -+	run_on_sparse $* &&
     ++	run_on_sparse "$@" &&
      +	test_cmp sparse-checkout-out sparse-index-out &&
      +	test_cmp sparse-checkout-err sparse-index-err
      +}
  7:  e9c82d2eda82 =  7:  dffe8821fde2 test-read-cache: print cache entries with --table
  8:  243541fc5820 =  8:  f4ad081f25bb test-tool: don't force full index
  9:  48f65093b3da =  9:  4780076a50df unpack-trees: ensure full index
 10:  83aac8b7a1ec = 10:  33fdba2b8cfd sparse-checkout: hold pattern list in index
 11:  f6db0c27a285 ! 11:  e41b14e03ebb sparse-index: convert from full to sparse
     @@ t/t1092-sparse-checkout-compatibility.sh
       
       test_description='compare full workdir to sparse workdir'
       
     ++# The verify_cache_tree() check is not sparse-aware (yet).
     ++# So, disable the check until that integration is complete.
      +GIT_TEST_CHECK_CACHE_TREE=0
      +GIT_TEST_SPLIT_INDEX=0
      +
     @@ t/t1092-sparse-checkout-compatibility.sh: run_on_all () {
       }
       
       test_sparse_match () {
     --	run_on_sparse $* &&
     -+	run_on_sparse "$@" &&
     - 	test_cmp sparse-checkout-out sparse-index-out &&
     +@@ t/t1092-sparse-checkout-compatibility.sh: test_sparse_match () {
       	test_cmp sparse-checkout-err sparse-index-err
       }
       
 12:  f2a3e7298798 = 12:  b77cd6b02265 submodule: sparse-index should not collapse links
 13:  6f1ebe6ccc08 = 13:  4000c5cdd4cf unpack-trees: allow sparse directories
 14:  3fa684b315fb = 14:  1a2be38b2ca7 sparse-index: check index conversion happens
 15:  d74576d677f6 = 15:  f89891b0ae4e sparse-index: create extension for compatibility
 16:  e530ca5f668d ! 16:  bd703c76c859 sparse-checkout: toggle sparse index from builtin
     @@ Documentation/git-sparse-checkout.txt: To avoid interfering with other worktrees
      ++
      +**WARNING:** Using a sparse index requires modifying the index in a way
      +that is not completely understood by external tools. If you have trouble
     -+with this compatibility, then run `git sparse-checkout sparse-index disable`
     ++with this compatibility, then run `git sparse-checkout init --no-sparse-index`
      +to rewrite your index to not be sparse. Older versions of Git will not
      +understand the `sparseIndex` repository extension and may fail to interact
      +with your repository until it is disabled.
     @@ sparse-index.h: struct index_state;
      
       ## t/t1092-sparse-checkout-compatibility.sh ##
      @@ t/t1092-sparse-checkout-compatibility.sh: test_description='compare full workdir to sparse workdir'
     - 
     + # So, disable the check until that integration is complete.
       GIT_TEST_CHECK_CACHE_TREE=0
       GIT_TEST_SPLIT_INDEX=0
      +GIT_TEST_SPARSE_INDEX=
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index cont
       
       	test-tool -C sparse-index read-cache --table >cache &&
       	for dir in deep/deeper2 folder1 folder2 x
     +@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index contents' '
     + 		TREE=$(git -C sparse-index rev-parse HEAD:$dir) &&
     + 		grep "040000 tree $TREE	$dir/" cache \
     + 			|| return 1
     +-	done
     ++	done &&
     ++
     ++	# Disabling the sparse-index removes tree entries with full ones
     ++	git -C sparse-index sparse-checkout init --no-sparse-index &&
     ++
     ++	test-tool -C sparse-index read-cache --table >cache &&
     ++	! grep "040000 tree" cache &&
     ++	test_sparse_match test-tool read-cache --table
     + '
     + 
     + test_expect_success 'expanded in-memory index matches full index' '
      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'submodule handling' '
       test_expect_success 'sparse-index is expanded and converted back' '
       	init_repos &&
 17:  42d0da9c5def = 17:  598557f90a2a sparse-checkout: disable sparse-index
 18:  6bb0976a6295 ! 18:  c2d0c17db31a cache-tree: integrate with sparse directory entries
     @@ sparse-index.c: int convert_to_sparse(struct index_state *istate)
       	trace2_region_leave("index", "convert_to_sparse", istate->repo);
       	return 0;
      @@ sparse-index.c: void ensure_full_index(struct index_state *istate)
     - 
     + 	strbuf_release(&base);
       	free(full);
       
      +	/* Clear and recompute the cache-tree */
 19:  07f34e80609a ! 19:  6fdd9323c14e sparse-index: loose integration with cache_tree_verify()
     @@ t/t1092-sparse-checkout-compatibility.sh
       
       test_description='compare full workdir to sparse workdir'
       
     +-# The verify_cache_tree() check is not sparse-aware (yet).
     +-# So, disable the check until that integration is complete.
      -GIT_TEST_CHECK_CACHE_TREE=0
       GIT_TEST_SPLIT_INDEX=0
       GIT_TEST_SPARSE_INDEX=
 20:  41e3b56b9c17 = 20:  3db06ac46dd5 p2000: add sparse-index repos

Comments

Derrick Stolee March 16, 2021, 4:59 p.m. UTC | #1
On 3/16/2021 12:42 PM, Derrick Stolee via GitGitGadget wrote:> Updates in V3
> =============
> 
> For this version, I took Ævar's latest patches and applied them to v2.31.0
> and rebased this series on top. It uses his new "read_tree_at()" helper and
> the associated changes to the function pointer type.

Junio, I wanted to call your attention to this change in base.

Here is the relevant part of the range-diff:

>   5:  399ddb0bad56 !  5:  99292cdbaae4 sparse-index: implement ensure_full_index()
>      @@ sparse-index.c
>       +}
>       +
>       +static int add_path_to_index(const struct object_id *oid,
>      -+				struct strbuf *base, const char *path,
>      -+				unsigned int mode, int stage, void *context)
>      ++			     struct strbuf *base, const char *path,
>      ++			     unsigned int mode, void *context)
>       +{
>       +	struct index_state *istate = (struct index_state *)context;
>       +	struct cache_entry *ce;
>      @@ sparse-index.c
>       -	/* intentionally left blank */
>       +	int i;
>       +	struct index_state *full;
>      ++	struct strbuf base = STRBUF_INIT;
>       +
>       +	if (!istate || !istate->sparse_index)
>       +		return;
>      @@ sparse-index.c
>       +		ps.has_wildcard = 1;
>       +		ps.max_depth = -1;
>       +
>      -+		read_tree_recursive(istate->repo, tree,
>      -+				    ce->name, strlen(ce->name),
>      -+				    0, &ps,
>      -+				    add_path_to_index, full);
>      ++		strbuf_setlen(&base, 0);
>      ++		strbuf_add(&base, ce->name, strlen(ce->name));
>      ++
>      ++		read_tree_at(istate->repo, tree, &base, &ps,
>      ++			     add_path_to_index, full);
>       +
>       +		/* free directory entries. full entries are re-used */
>       +		discard_cache_entry(ce);
>      @@ sparse-index.c
>       +	istate->cache_nr = full->cache_nr;
>       +	istate->cache_alloc = full->cache_alloc;
>       +
>      ++	strbuf_release(&base);
>       +	free(full);
>       +
>       +	trace2_region_leave("index", "ensure_full_index", istate->repo);

Thanks,
-Stolee
Elijah Newren March 16, 2021, 9:18 p.m. UTC | #2
On Tue, Mar 16, 2021 at 9:43 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> Here is the first full patch series submission coming out of the
> sparse-index RFC [1].
>
> [1]
> https://lore.kernel.org/git/pull.847.git.1611596533.gitgitgadget@gmail.com/
>
> I won't waste too much space here, because PATCH 1 includes a sizeable
> design document that describes the feature, the reasoning behind it, and my
> plan for getting this implemented widely throughout the codebase.
>
> There are some new things here that were not in the RFC:
>
>  * Design doc and format updates. (Patch 1)
>  * Performance test script. (Patches 2 and 20)
>
> Notably missing in this series from the RFC:
>
>  * The mega-patch inserting ensure_full_index() throughout the codebase.
>    That will be a follow-up series to this one.
>  * The integrations with git status and git add to demonstrate the improved
>    performance. Those will also appear in their own series later.
>
> I plan to keep my latest work in this area in my 'sparse-index/wip' branch
> [2]. It includes all of the work from the RFC right now, updated with the
> work from this series.
>
> [2] https://github.com/derrickstolee/git/tree/sparse-index/wip
>
>
> Updates in V3
> =============
>
> For this version, I took Ævar's latest patches and applied them to v2.31.0
> and rebased this series on top. It uses his new "read_tree_at()" helper and
> the associated changes to the function pointer type.
>
>  * Fixed more typos. Thanks Martin and Elijah!
>  * Updated the test_sparse_match() macro to use "$@" instead of $*
>  * Added a test that git sparse-checkout init --no-sparse-index rewrites the
>    index to be full.

I've read through the range-diff.  Sorry for not spotting the conflict
with Ævar's series (that I also reviewed).  Anyway, my Reviewed-by
from the last series still holds.
Junio C Hamano March 18, 2021, 9:50 p.m. UTC | #3
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> For this version, I took Ævar's latest patches and applied them to v2.31.0
> and rebased this series on top. It uses his new "read_tree_at()" helper and
> the associated changes to the function pointer type.
>
>  * Fixed more typos. Thanks Martin and Elijah!
>  * Updated the test_sparse_match() macro to use "$@" instead of $*
>  * Added a test that git sparse-checkout init --no-sparse-index rewrites the
>    index to be full.

Thanks.  I expect ab/read-tree would be rerolled at least one more
time, if only to straighten out the "oops #5 was screwy, let's patch
it up on top with three more steps", but I do not expect the end
state would be all that different, so tentatively I'll queue these
patches on top of the latest iteration of the topic for now and
hope that the other topic will be updated soonish.
Derrick Stolee March 19, 2021, 1 p.m. UTC | #4
On 3/18/2021 5:50 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> For this version, I took Ævar's latest patches and applied them to v2.31.0
>> and rebased this series on top. It uses his new "read_tree_at()" helper and
>> the associated changes to the function pointer type.
>>
>>  * Fixed more typos. Thanks Martin and Elijah!
>>  * Updated the test_sparse_match() macro to use "$@" instead of $*
>>  * Added a test that git sparse-checkout init --no-sparse-index rewrites the
>>    index to be full.
> 
> Thanks.  I expect ab/read-tree would be rerolled at least one more
> time, if only to straighten out the "oops #5 was screwy, let's patch
> it up on top with three more steps", but I do not expect the end
> state would be all that different, so tentatively I'll queue these
> patches on top of the latest iteration of the topic for now and
> hope that the other topic will be updated soonish.

Thanks. I'm grateful that it can spend some time in 'seen' if only
to avoid these conflicts in the meantime.

I'm waiting for that reroll of ab/read-tree before updating this
version with the feedback from v3.

Thanks,
-Stolee