mbox series

[v8,00/15] Sparse-index: integrate with status

Message ID pull.932.v8.git.1626112556.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Sparse-index: integrate with status | expand

Message

Johannes Schindelin via GitGitGadget July 12, 2021, 5:55 p.m. UTC
This is the first "payoff" series in the sparse-index work. It makes 'git
status' very fast when a sparse-index is enabled on a repository with
cone-mode sparse-checkout (and a small populated set).

This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.
The latter branch is needed because it changes the behavior of 'git add'
around sparse entries, which changes the expectations of a test added in
patch 1.

The approach here is to audit the places where ensure_full_index() pops up
while doing normal commands with pathspecs within the sparse-checkout
definition. Each of these are checked and tested. In the end, the
sparse-index is integrated with these features:

 * git status
 * FS Monitor index extension.

The performance tests in p2000-sparse-operations.sh improve by 95% or more,
even when compared with the full-index cases, not just the sparse-index
cases that previously had extra overhead.

Hopefully this is the first example of how ds/sparse-index-protections has
done the basic work to do these conversions safely, making them look easier
than they seemed when starting this adventure.

Thanks, -Stolee


Update in V8
============

 * The directory/file conflict patch is removed and delayed to the next
   series where it will be required. (It will also be improved in that
   series.)

 * Some comments have been improved, including a new assert() that helps
   document the situation.

Derrick Stolee (15):
  sparse-index: skip indexes with unmerged entries
  sparse-index: include EXTENDED flag when expanding
  t1092: replace incorrect 'echo' with 'cat'
  t1092: expand repository data shape
  t1092: add tests for status/add and sparse files
  unpack-trees: preserve cache_bottom
  unpack-trees: compare sparse directories correctly
  unpack-trees: rename unpack_nondirectories()
  unpack-trees: unpack sparse directory entries
  dir.c: accept a directory as part of cone-mode patterns
  diff-lib: handle index diffs with sparse dirs
  status: skip sparse-checkout percentage with sparse-index
  status: use sparse-index throughout
  wt-status: expand added sparse directory entries
  fsmonitor: integrate with sparse index

 builtin/commit.c                         |   3 +
 diff-lib.c                               |  19 +++
 dir.c                                    |  24 +++-
 read-cache.c                             |  10 +-
 sparse-index.c                           |  27 +++-
 t/t1092-sparse-checkout-compatibility.sh | 158 ++++++++++++++++++++++-
 t/t7519-status-fsmonitor.sh              |  49 +++++++
 unpack-trees.c                           | 142 +++++++++++++++++---
 wt-status.c                              |  65 +++++++++-
 wt-status.h                              |   1 +
 10 files changed, 464 insertions(+), 34 deletions(-)


base-commit: d486ca60a51c9cb1fe068803c3f540724e95e83a
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-932%2Fderrickstolee%2Fsparse-index%2Fstatus-and-add-v8
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-932/derrickstolee/sparse-index/status-and-add-v8
Pull-Request: https://github.com/gitgitgadget/git/pull/932

Range-diff vs v7:

  1:  2a4a7256304 =  1:  1815c148e8c sparse-index: skip indexes with unmerged entries
  2:  f5bae86014d =  2:  7bcde075d8d sparse-index: include EXTENDED flag when expanding
  3:  d965669c766 =  3:  05981e30b97 t1092: replace incorrect 'echo' with 'cat'
  4:  e10fa11cfdb !  4:  d38b66e9ee4 t1092: expand repository data shape
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'diff --staged' '
       	for branch in rename-out-to-out rename-out-to-in rename-in-to-out
       	do
       		test_all_match git checkout rename-base &&
     - 		test_all_match git checkout $branch -- .&&
     + 		test_all_match git checkout $branch -- . &&
      +		test_all_match git status --porcelain=v2 &&
      +		test_all_match git diff --staged --no-renames &&
      +		test_all_match git diff --staged --find-renames || return 1
  5:  e94ffa07d46 =  5:  95ddd3abe4e t1092: add tests for status/add and sparse files
  6:  a8dda933567 =  6:  b182b456613 unpack-trees: preserve cache_bottom
  7:  e52166f6e4c =  7:  988ddce4d45 unpack-trees: compare sparse directories correctly
  8:  d04b62381b8 =  8:  d67ad048b08 unpack-trees: rename unpack_nondirectories()
  9:  237ccf4e43d !  9:  c0b0b58584c unpack-trees: unpack sparse directory entries
     @@ unpack-trees.c: static int find_cache_pos(struct traverse_info *info,
      +	 * Check for a sparse-directory entry named "path/".
      +	 * Due to the input p->path not having a trailing
      +	 * slash, the negative 'pos' value overshoots the
     -+	 * expected position by at least one, hence "-2" here.
     ++	 * expected position, hence "-2" instead of "-1".
      +	 */
      +	pos = -pos - 2;
      +
     @@ unpack-trees.c: static int find_cache_pos(struct traverse_info *info,
       		return NULL;
      +
      +	/*
     -+	 * We might have multiple entries between 'pos' and
     -+	 * the actual sparse-directory entry, so start walking
     -+	 * back until finding it or passing where it would be.
     ++	 * Due to lexicographic sorting and sparse directory
     ++	 * entried ending with a trailing slash, our path as a
     ++	 * sparse directory (e.g "subdir/") and	our path as a
     ++	 * file (e.g. "subdir") might be separated by other
     ++	 * paths (e.g. "subdir-").
      +	 */
      +	while (pos >= 0) {
      +		ce = o->src_index->cache[pos];
 10:  9f31c691af6 <  -:  ----------- unpack-trees: handle dir/file conflict of sparse entries
 11:  2a43287c47e = 10:  76c7528f78f dir.c: accept a directory as part of cone-mode patterns
 12:  f83aa08ff6b ! 11:  d875a7f8585 diff-lib: handle index diffs with sparse dirs
     @@ diff-lib.c: static int show_modified(struct rev_info *revs,
       	unsigned dirty_submodule = 0;
       	struct index_state *istate = revs->diffopt.repo->index;
       
     ++	assert(S_ISSPARSEDIR(old_entry->ce_mode) ==
     ++	       S_ISSPARSEDIR(new_entry->ce_mode));
     ++
      +	/*
      +	 * If both are sparse directory entries, then expand the
     -+	 * modifications to the file level.
     ++	 * modifications to the file level. If only one was a sparse
     ++	 * directory, then they appear as an add and delete instead of
     ++	 * a modification.
      +	 */
     -+	if (old_entry && new_entry &&
     -+	    S_ISSPARSEDIR(old_entry->ce_mode) &&
     -+	    S_ISSPARSEDIR(new_entry->ce_mode)) {
     ++	if (S_ISSPARSEDIR(new_entry->ce_mode)) {
      +		diff_tree_oid(&old_entry->oid, &new_entry->oid, new_entry->name, &revs->diffopt);
      +		return 0;
      +	}
 13:  35063ffb8ed = 12:  2b72cc2d985 status: skip sparse-checkout percentage with sparse-index
 14:  b4033a9bf36 = 13:  1c1feef3733 status: use sparse-index throughout
 15:  717a3f49f97 = 14:  dada1b91bdc wt-status: expand added sparse directory entries
 16:  1d744848ee6 = 15:  bdc771cf373 fsmonitor: integrate with sparse index

Comments

Elijah Newren July 12, 2021, 7:38 p.m. UTC | #1
On Mon, Jul 12, 2021 at 10:55 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This is the first "payoff" series in the sparse-index work. It makes 'git
> status' very fast when a sparse-index is enabled on a repository with
> cone-mode sparse-checkout (and a small populated set).
>
...
> Update in V8
> ============
>
>  * The directory/file conflict patch is removed and delayed to the next
>    series where it will be required. (It will also be improved in that
>    series.)
>
>  * Some comments have been improved, including a new assert() that helps
>    document the situation.
>

This one looks really good.  Just two minor comments:

> Range-diff vs v7:
>
...
>   9:  237ccf4e43d !  9:  c0b0b58584c unpack-trees: unpack sparse directory entries
>      @@ unpack-trees.c: static int find_cache_pos(struct traverse_info *info,
>       +  * Check for a sparse-directory entry named "path/".
>       +  * Due to the input p->path not having a trailing
>       +  * slash, the negative 'pos' value overshoots the
>      -+  * expected position by at least one, hence "-2" here.
>      ++  * expected position, hence "-2" instead of "-1".
>       +  */
>       + pos = -pos - 2;
>       +
>      @@ unpack-trees.c: static int find_cache_pos(struct traverse_info *info,
>                 return NULL;
>       +
>       + /*
>      -+  * We might have multiple entries between 'pos' and
>      -+  * the actual sparse-directory entry, so start walking
>      -+  * back until finding it or passing where it would be.
>      ++  * Due to lexicographic sorting and sparse directory
>      ++  * entried ending with a trailing slash, our path as a

s/entried/entries/ ?


>      ++  * sparse directory (e.g "subdir/") and our path as a
>      ++  * file (e.g. "subdir") might be separated by other
>      ++  * paths (e.g. "subdir-").
>       +  */
>       + while (pos >= 0) {
>       +         ce = o->src_index->cache[pos];
...
>  15:  717a3f49f97 = 14:  dada1b91bdc wt-status: expand added sparse directory entries

As I commented over at [1], I would appreciate if we could at least
add a comment in the testcase that we know this testcase triggers a
bug for both sparse-index and sparse-checkout...and that fixing it
might affect the other comments and commands within that testcase in
the future...but for now, we're just testing as best we can that the
two give the same behavior.

[1] https://lore.kernel.org/git/CABPp-BGJ+LTubgS=zvGJjk3kgyfW-7UFEa=qg-0mdyrY32j0pQ@mail.gmail.com/


If you agree and include the two fixups above, the entire series is:
Reviewed-by: Elijah Newren <newren@gmail.com>

If you disagree, then all patches other than 9 and 14 can have my
Reviewed-by tag.  :-)


Thanks for all the awesome work!
Derrick Stolee July 13, 2021, 12:57 p.m. UTC | #2
On 7/12/2021 3:38 PM, Elijah Newren wrote:
> On Mon, Jul 12, 2021 at 10:55 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>   9:  237ccf4e43d !  9:  c0b0b58584c unpack-trees: unpack sparse directory entries
>>      @@ unpack-trees.c: static int find_cache_pos(struct traverse_info *info,
>>       +  * Check for a sparse-directory entry named "path/".
>>       +  * Due to the input p->path not having a trailing
>>       +  * slash, the negative 'pos' value overshoots the
>>      -+  * expected position by at least one, hence "-2" here.
>>      ++  * expected position, hence "-2" instead of "-1".
>>       +  */
>>       + pos = -pos - 2;
>>       +
>>      @@ unpack-trees.c: static int find_cache_pos(struct traverse_info *info,
>>                 return NULL;
>>       +
>>       + /*
>>      -+  * We might have multiple entries between 'pos' and
>>      -+  * the actual sparse-directory entry, so start walking
>>      -+  * back until finding it or passing where it would be.
>>      ++  * Due to lexicographic sorting and sparse directory
>>      ++  * entried ending with a trailing slash, our path as a
> 
> s/entried/entries/ ?

Oops! Yes, that would be a valuable fixup. Thanks for catching it.

> 
>>      ++  * sparse directory (e.g "subdir/") and our path as a
>>      ++  * file (e.g. "subdir") might be separated by other
>>      ++  * paths (e.g. "subdir-").
>>       +  */
>>       + while (pos >= 0) {
>>       +         ce = o->src_index->cache[pos];
> ...
>>  15:  717a3f49f97 = 14:  dada1b91bdc wt-status: expand added sparse directory entries
> 
> As I commented over at [1], I would appreciate if we could at least
> add a comment in the testcase that we know this testcase triggers a
> bug for both sparse-index and sparse-checkout...and that fixing it
> might affect the other comments and commands within that testcase in
> the future...but for now, we're just testing as best we can that the
> two give the same behavior.
> 
> [1] https://lore.kernel.org/git/CABPp-BGJ+LTubgS=zvGJjk3kgyfW-7UFEa=qg-0mdyrY32j0pQ@mail.gmail.com/

How do you feel about a new patch that focuses on adding these
comments, including an older test that had a similar documentation
of the behavior change? A patch that could be queued on top of
this series is pasted below the cutline.

Thanks,
-Stolee


-- >8 --

From 8e69def90f5844c117cc1e9efd673c92b85c9238 Mon Sep 17 00:00:00 2001
From: Derrick Stolee <dstolee@microsoft.com>
Date: Tue, 13 Jul 2021 08:50:24 -0400
Subject: [PATCH 16/15] t1092: document bad sparse-checkout behavior

There are several situations where a repository with sparse-checkout
enabled will act differently than a normal repository, and in ways that
are not intentional. The test t1092-sparse-checkout-compatibility.sh
documents some of these deviations, but a casual reader might think
these are intentional behavior changes.

Add comments on these tests that make it clear that these behaviors
should be updated. Using 'NEEDSWORK' helps contributors find that these
are potential areas for improvement.

Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 2394c36d881..cabbd42e339 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -392,8 +392,8 @@ test_expect_failure 'blame with pathspec outside sparse definition' '
 	test_all_match git blame deep/deeper2/deepest/a
 '
 
-# TODO: reset currently does not behave as expected when in a
-# sparse-checkout.
+# NEEDSWORK: a sparse-checkout behaves differently from a full checkout
+# in this scenario, but it shouldn't.
 test_expect_failure 'checkout and reset (mixed)' '
 	init_repos &&
 
@@ -403,8 +403,8 @@ test_expect_failure 'checkout and reset (mixed)' '
 	test_all_match git reset update-folder2
 '
 
-# Ensure that sparse-index behaves identically to
-# sparse-checkout with a full index.
+# NEEDSWORK: a sparse-checkout behaves differently from a full checkout
+# in this scenario, but it shouldn't.
 test_expect_success 'checkout and reset (mixed) [sparse]' '
 	init_repos &&
 
@@ -524,6 +524,8 @@ test_expect_success 'sparse-index is not expanded' '
 	test_region ! index ensure_full_index trace2.txt
 '
 
+# NEEDSWORK: a sparse-checkout behaves differently from a full checkout
+# in this scenario, but it shouldn't.
 test_expect_success 'reset mixed and checkout orphan' '
 	init_repos &&
Elijah Newren July 13, 2021, 5:37 p.m. UTC | #3
On Tue, Jul 13, 2021 at 5:57 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 7/12/2021 3:38 PM, Elijah Newren wrote:
> > On Mon, Jul 12, 2021 at 10:55 AM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> >>   9:  237ccf4e43d !  9:  c0b0b58584c unpack-trees: unpack sparse directory entries
> >>      @@ unpack-trees.c: static int find_cache_pos(struct traverse_info *info,
> >>       +  * Check for a sparse-directory entry named "path/".
> >>       +  * Due to the input p->path not having a trailing
> >>       +  * slash, the negative 'pos' value overshoots the
> >>      -+  * expected position by at least one, hence "-2" here.
> >>      ++  * expected position, hence "-2" instead of "-1".
> >>       +  */
> >>       + pos = -pos - 2;
> >>       +
> >>      @@ unpack-trees.c: static int find_cache_pos(struct traverse_info *info,
> >>                 return NULL;
> >>       +
> >>       + /*
> >>      -+  * We might have multiple entries between 'pos' and
> >>      -+  * the actual sparse-directory entry, so start walking
> >>      -+  * back until finding it or passing where it would be.
> >>      ++  * Due to lexicographic sorting and sparse directory
> >>      ++  * entried ending with a trailing slash, our path as a
> >
> > s/entried/entries/ ?
>
> Oops! Yes, that would be a valuable fixup. Thanks for catching it.
>
> >
> >>      ++  * sparse directory (e.g "subdir/") and our path as a
> >>      ++  * file (e.g. "subdir") might be separated by other
> >>      ++  * paths (e.g. "subdir-").
> >>       +  */
> >>       + while (pos >= 0) {
> >>       +         ce = o->src_index->cache[pos];
> > ...
> >>  15:  717a3f49f97 = 14:  dada1b91bdc wt-status: expand added sparse directory entries
> >
> > As I commented over at [1], I would appreciate if we could at least
> > add a comment in the testcase that we know this testcase triggers a
> > bug for both sparse-index and sparse-checkout...and that fixing it
> > might affect the other comments and commands within that testcase in
> > the future...but for now, we're just testing as best we can that the
> > two give the same behavior.
> >
> > [1] https://lore.kernel.org/git/CABPp-BGJ+LTubgS=zvGJjk3kgyfW-7UFEa=qg-0mdyrY32j0pQ@mail.gmail.com/
>
> How do you feel about a new patch that focuses on adding these
> comments, including an older test that had a similar documentation
> of the behavior change? A patch that could be queued on top of
> this series is pasted below the cutline.

Looks good to me.  Re-roll with my Reviewed-by and let's get this
series merged down to next.  :-)

>
> Thanks,
> -Stolee
>
>
> -- >8 --
>
> From 8e69def90f5844c117cc1e9efd673c92b85c9238 Mon Sep 17 00:00:00 2001
> From: Derrick Stolee <dstolee@microsoft.com>
> Date: Tue, 13 Jul 2021 08:50:24 -0400
> Subject: [PATCH 16/15] t1092: document bad sparse-checkout behavior
>
> There are several situations where a repository with sparse-checkout
> enabled will act differently than a normal repository, and in ways that
> are not intentional. The test t1092-sparse-checkout-compatibility.sh
> documents some of these deviations, but a casual reader might think
> these are intentional behavior changes.
>
> Add comments on these tests that make it clear that these behaviors
> should be updated. Using 'NEEDSWORK' helps contributors find that these
> are potential areas for improvement.
>
> Helped-by: Elijah Newren <newren@gmail.com>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 2394c36d881..cabbd42e339 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -392,8 +392,8 @@ test_expect_failure 'blame with pathspec outside sparse definition' '
>         test_all_match git blame deep/deeper2/deepest/a
>  '
>
> -# TODO: reset currently does not behave as expected when in a
> -# sparse-checkout.
> +# NEEDSWORK: a sparse-checkout behaves differently from a full checkout
> +# in this scenario, but it shouldn't.
>  test_expect_failure 'checkout and reset (mixed)' '
>         init_repos &&
>
> @@ -403,8 +403,8 @@ test_expect_failure 'checkout and reset (mixed)' '
>         test_all_match git reset update-folder2
>  '
>
> -# Ensure that sparse-index behaves identically to
> -# sparse-checkout with a full index.
> +# NEEDSWORK: a sparse-checkout behaves differently from a full checkout
> +# in this scenario, but it shouldn't.
>  test_expect_success 'checkout and reset (mixed) [sparse]' '
>         init_repos &&
>
> @@ -524,6 +524,8 @@ test_expect_success 'sparse-index is not expanded' '
>         test_region ! index ensure_full_index trace2.txt
>  '
>
> +# NEEDSWORK: a sparse-checkout behaves differently from a full checkout
> +# in this scenario, but it shouldn't.
>  test_expect_success 'reset mixed and checkout orphan' '
>         init_repos &&
>
> --