diff mbox series

[v2,6/7] reset: make --mixed sparse-aware

Message ID 5eaae0825af4cee84040b784c32190bb1de2f919.1633440057.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Sparse Index: integrate with reset | expand

Commit Message

Victoria Dye Oct. 5, 2021, 1:20 p.m. UTC
From: Victoria Dye <vdye@github.com>

Sparse directory entries are "diffed" as trees in `diff_cache` (used
internally by `reset --mixed`), following a code path separate from
individual file handling. The use of `diff_tree_oid` there requires setting
explicit `change` and `add_remove` functions to process the internal
contents of a sparse directory.

Additionally, the `recursive` diff option handles cases in which `reset
--mixed` must diff/merge files that are nested multiple levels deep in a
sparse directory.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 builtin/reset.c                          | 30 +++++++++++++++++++++++-
 t/t1092-sparse-checkout-compatibility.sh | 13 +++++++++-
 2 files changed, 41 insertions(+), 2 deletions(-)

Comments

Elijah Newren Oct. 6, 2021, 4:43 a.m. UTC | #1
On Tue, Oct 5, 2021 at 6:21 AM Victoria Dye via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Victoria Dye <vdye@github.com>
>
> Sparse directory entries are "diffed" as trees in `diff_cache` (used
> internally by `reset --mixed`), following a code path separate from
> individual file handling. The use of `diff_tree_oid` there requires setting
> explicit `change` and `add_remove` functions to process the internal
> contents of a sparse directory.
>
> Additionally, the `recursive` diff option handles cases in which `reset
> --mixed` must diff/merge files that are nested multiple levels deep in a
> sparse directory.
>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
>  builtin/reset.c                          | 30 +++++++++++++++++++++++-
>  t/t1092-sparse-checkout-compatibility.sh | 13 +++++++++-
>  2 files changed, 41 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/reset.c b/builtin/reset.c
> index e1f2a2bb2c4..ceb9b122897 100644
> --- a/builtin/reset.c
> +++ b/builtin/reset.c
> @@ -175,6 +175,8 @@ static int read_from_tree(const struct pathspec *pathspec,
>                           int intent_to_add)
>  {
>         struct diff_options opt;
> +       unsigned int i;
> +       char *skip_worktree_seen = NULL;
>
>         memset(&opt, 0, sizeof(opt));
>         copy_pathspec(&opt.pathspec, pathspec);
> @@ -182,9 +184,35 @@ static int read_from_tree(const struct pathspec *pathspec,
>         opt.format_callback = update_index_from_diff;
>         opt.format_callback_data = &intent_to_add;
>         opt.flags.override_submodule_config = 1;
> +       opt.flags.recursive = 1;
>         opt.repo = the_repository;
> +       opt.change = diff_change;
> +       opt.add_remove = diff_addremove;
> +
> +       /*
> +        * When pathspec is given for resetting a cone-mode sparse checkout, it may
> +        * identify entries that are nested in sparse directories, in which case the
> +        * index should be expanded. For the sake of efficiency, this check is
> +        * overly-cautious: anything with a wildcard or a magic prefix requires
> +        * expansion, as well as literal paths that aren't in the sparse checkout
> +        * definition AND don't match any directory in the index.

s/efficiency/efficiency of checking/ ?  Being overly-cautious suggests
you'll expand to a full index more than is needed, and full indexes
are more expensive.  But perhaps the checking would be expensive too
so you have a tradeoff?

Or maybe s/efficiency/simplicity/?

> +        */
> +       if (pathspec->nr && the_index.sparse_index) {
> +               if (pathspec->magic || pathspec->has_wildcard) {
> +                       ensure_full_index(&the_index);

dir.c has the notion of matching the characters preceding the wildcard
characters; look for "no_wildcard_len".  If the pathspec doesn't match
a path up to no_wildcard_len, then the wildcard character(s) later in
the pathspec can't make the pathspec match that path.

It might at least be worth mentioning this as a possible future optimization.

> +               } else {
> +                       for (i = 0; i < pathspec->nr; i++) {
> +                               if (!path_in_cone_mode_sparse_checkout(pathspec->items[i].original, &the_index) &&
> +                                   !matches_skip_worktree(pathspec, i, &skip_worktree_seen)) {

What if the pathspec corresponds to a sparse-directory in the index,
but possibly without the trailing '/' character?  e.g.:

   git reset HEAD~1 -- sparse-directory

One should be able to reset that directory without recursing into
it...does this code handle that?  Does it handle it if we add the
trailing slash on the path for the reset command line?

> +                                       ensure_full_index(&the_index);
> +                                       break;
> +                               }
> +                       }
> +               }
> +       }
> +
> +       free(skip_worktree_seen);
>
> -       ensure_full_index(&the_index);
>         if (do_diff_cache(tree_oid, &opt))
>                 return 1;
>         diffcore_std(&opt);
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index e301ef5633a..4afcbc2d673 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -804,11 +804,22 @@ test_expect_success 'sparse-index is not expanded' '
>                 ensure_not_expanded reset --hard $ref || return 1
>         done &&
>
> +       ensure_not_expanded reset --mixed base &&
>         ensure_not_expanded reset --hard update-deep &&
>         ensure_not_expanded reset --keep base &&
>         ensure_not_expanded reset --merge update-deep &&
> -       ensure_not_expanded reset --hard &&

This commit was only touching the --mixed case; why is it removing one
of the tests for --hard?

>
> +       ensure_not_expanded reset base -- deep/a &&
> +       ensure_not_expanded reset base -- nonexistent-file &&
> +       ensure_not_expanded reset deepest -- deep &&
> +
> +       # Although folder1 is outside the sparse definition, it exists as a
> +       # directory entry in the index, so it will be reset without needing to
> +       # expand the full index.

Ah, I think this answers one of my earlier questions.  Does it also
work with 'folder1/' as well as 'folder1'?

> +       ensure_not_expanded reset --hard update-folder1 &&

Wait...is update-folder1 a branch or a path?  And if this commit is
about --mixed, why are --hard testcases being added?

> +       ensure_not_expanded reset base -- folder1 &&
> +
> +       ensure_not_expanded reset --hard update-deep &&

another --hard testcase...was this an accidental squash by chance?



>         ensure_not_expanded checkout -f update-deep &&
>         test_config -C sparse-index pull.twohead ort &&
>         (
> --
> gitgitgadget
>
Victoria Dye Oct. 7, 2021, 2:34 p.m. UTC | #2
Elijah Newren wrote:
>> +       /*
>> +        * When pathspec is given for resetting a cone-mode sparse checkout, it may
>> +        * identify entries that are nested in sparse directories, in which case the
>> +        * index should be expanded. For the sake of efficiency, this check is
>> +        * overly-cautious: anything with a wildcard or a magic prefix requires
>> +        * expansion, as well as literal paths that aren't in the sparse checkout
>> +        * definition AND don't match any directory in the index.
> 
> s/efficiency/efficiency of checking/ ?  Being overly-cautious suggests
> you'll expand to a full index more than is needed, and full indexes
> are more expensive.  But perhaps the checking would be expensive too
> so you have a tradeoff?
> 
> Or maybe s/efficiency/simplicity/?
> 

"Simplicity" is probably more appropriate, although the original intent was
"efficiency of checking". I wanted to avoid repeated iteration over the
index (for example, matching the `no_wildcard_len` of each wildcard pathspec
item against each sparse directory in the index). However, to your point,
expanding the index is a more expensive operation anyway, so it's probably
worth the more involved checks.

>> +        */
>> +       if (pathspec->nr && the_index.sparse_index) {
>> +               if (pathspec->magic || pathspec->has_wildcard) {
>> +                       ensure_full_index(&the_index);
> 
> dir.c has the notion of matching the characters preceding the wildcard
> characters; look for "no_wildcard_len".  If the pathspec doesn't match
> a path up to no_wildcard_len, then the wildcard character(s) later in
> the pathspec can't make the pathspec match that path.
> 
> It might at least be worth mentioning this as a possible future optimization.
> 

I'll incorporate a something like this into the next version.

>> +               } else {
>> +                       for (i = 0; i < pathspec->nr; i++) {
>> +                               if (!path_in_cone_mode_sparse_checkout(pathspec->items[i].original, &the_index) &&
>> +                                   !matches_skip_worktree(pathspec, i, &skip_worktree_seen)) {
> 
> What if the pathspec corresponds to a sparse-directory in the index,
> but possibly without the trailing '/' character?  e.g.:
> 
>    git reset HEAD~1 -- sparse-directory
> 
> One should be able to reset that directory without recursing into
> it...does this code handle that?  Does it handle it if we add the
> trailing slash on the path for the reset command line?
> 

It handles both cases (with and without trailing slash), the former due to
`!matches_skip_worktree(...)` and the latter due to
`!path_in_cone_mode_sparse_checkout(...)`.

>> +                                       ensure_full_index(&the_index);
>> +                                       break;
>> +                               }
>> +                       }
>> +               }
>> +       }
>> +
>> +       free(skip_worktree_seen);
>>
>> -       ensure_full_index(&the_index);
>>         if (do_diff_cache(tree_oid, &opt))
>>                 return 1;
>>         diffcore_std(&opt);
>> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
>> index e301ef5633a..4afcbc2d673 100755
>> --- a/t/t1092-sparse-checkout-compatibility.sh
>> +++ b/t/t1092-sparse-checkout-compatibility.sh
>> @@ -804,11 +804,22 @@ test_expect_success 'sparse-index is not expanded' '
>>                 ensure_not_expanded reset --hard $ref || return 1
>>         done &&
>>
>> +       ensure_not_expanded reset --mixed base &&
>>         ensure_not_expanded reset --hard update-deep &&
>>         ensure_not_expanded reset --keep base &&
>>         ensure_not_expanded reset --merge update-deep &&
>> -       ensure_not_expanded reset --hard &&
> 
> This commit was only touching the --mixed case; why is it removing one
> of the tests for --hard?
> 

[...]

>> +       ensure_not_expanded reset --hard update-folder1 &&
> 
> Wait...is update-folder1 a branch or a path?  And if this commit is
> about --mixed, why are --hard testcases being added?
> 
>> +       ensure_not_expanded reset base -- folder1 &&
>> +
>> +       ensure_not_expanded reset --hard update-deep &&
> 
> another --hard testcase...was this an accidental squash by chance?
> 

I included `git reset --hard` between the "actual" test cases so that the
`git reset --mixed` tests would start in a "clean" state (clear out any
modified files), but it's unnecessary in most cases so I'll remove them in
V3. To answer your other question, `update-folder1` is a branch.
diff mbox series

Patch

diff --git a/builtin/reset.c b/builtin/reset.c
index e1f2a2bb2c4..ceb9b122897 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -175,6 +175,8 @@  static int read_from_tree(const struct pathspec *pathspec,
 			  int intent_to_add)
 {
 	struct diff_options opt;
+	unsigned int i;
+	char *skip_worktree_seen = NULL;
 
 	memset(&opt, 0, sizeof(opt));
 	copy_pathspec(&opt.pathspec, pathspec);
@@ -182,9 +184,35 @@  static int read_from_tree(const struct pathspec *pathspec,
 	opt.format_callback = update_index_from_diff;
 	opt.format_callback_data = &intent_to_add;
 	opt.flags.override_submodule_config = 1;
+	opt.flags.recursive = 1;
 	opt.repo = the_repository;
+	opt.change = diff_change;
+	opt.add_remove = diff_addremove;
+
+	/*
+	 * When pathspec is given for resetting a cone-mode sparse checkout, it may
+	 * identify entries that are nested in sparse directories, in which case the
+	 * index should be expanded. For the sake of efficiency, this check is
+	 * overly-cautious: anything with a wildcard or a magic prefix requires
+	 * expansion, as well as literal paths that aren't in the sparse checkout
+	 * definition AND don't match any directory in the index.
+	 */
+	if (pathspec->nr && the_index.sparse_index) {
+		if (pathspec->magic || pathspec->has_wildcard) {
+			ensure_full_index(&the_index);
+		} else {
+			for (i = 0; i < pathspec->nr; i++) {
+				if (!path_in_cone_mode_sparse_checkout(pathspec->items[i].original, &the_index) &&
+				    !matches_skip_worktree(pathspec, i, &skip_worktree_seen)) {
+					ensure_full_index(&the_index);
+					break;
+				}
+			}
+		}
+	}
+
+	free(skip_worktree_seen);
 
-	ensure_full_index(&the_index);
 	if (do_diff_cache(tree_oid, &opt))
 		return 1;
 	diffcore_std(&opt);
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index e301ef5633a..4afcbc2d673 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -804,11 +804,22 @@  test_expect_success 'sparse-index is not expanded' '
 		ensure_not_expanded reset --hard $ref || return 1
 	done &&
 
+	ensure_not_expanded reset --mixed base &&
 	ensure_not_expanded reset --hard update-deep &&
 	ensure_not_expanded reset --keep base &&
 	ensure_not_expanded reset --merge update-deep &&
-	ensure_not_expanded reset --hard &&
 
+	ensure_not_expanded reset base -- deep/a &&
+	ensure_not_expanded reset base -- nonexistent-file &&
+	ensure_not_expanded reset deepest -- deep &&
+
+	# Although folder1 is outside the sparse definition, it exists as a
+	# directory entry in the index, so it will be reset without needing to
+	# expand the full index.
+	ensure_not_expanded reset --hard update-folder1 &&
+	ensure_not_expanded reset base -- folder1 &&
+
+	ensure_not_expanded reset --hard update-deep &&
 	ensure_not_expanded checkout -f update-deep &&
 	test_config -C sparse-index pull.twohead ort &&
 	(