Message ID | 5eaae0825af4cee84040b784c32190bb1de2f919.1633440057.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Sparse Index: integrate with reset | expand |
On Tue, Oct 5, 2021 at 6:21 AM Victoria Dye via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Victoria Dye <vdye@github.com> > > Sparse directory entries are "diffed" as trees in `diff_cache` (used > internally by `reset --mixed`), following a code path separate from > individual file handling. The use of `diff_tree_oid` there requires setting > explicit `change` and `add_remove` functions to process the internal > contents of a sparse directory. > > Additionally, the `recursive` diff option handles cases in which `reset > --mixed` must diff/merge files that are nested multiple levels deep in a > sparse directory. > > Signed-off-by: Victoria Dye <vdye@github.com> > --- > builtin/reset.c | 30 +++++++++++++++++++++++- > t/t1092-sparse-checkout-compatibility.sh | 13 +++++++++- > 2 files changed, 41 insertions(+), 2 deletions(-) > > diff --git a/builtin/reset.c b/builtin/reset.c > index e1f2a2bb2c4..ceb9b122897 100644 > --- a/builtin/reset.c > +++ b/builtin/reset.c > @@ -175,6 +175,8 @@ static int read_from_tree(const struct pathspec *pathspec, > int intent_to_add) > { > struct diff_options opt; > + unsigned int i; > + char *skip_worktree_seen = NULL; > > memset(&opt, 0, sizeof(opt)); > copy_pathspec(&opt.pathspec, pathspec); > @@ -182,9 +184,35 @@ static int read_from_tree(const struct pathspec *pathspec, > opt.format_callback = update_index_from_diff; > opt.format_callback_data = &intent_to_add; > opt.flags.override_submodule_config = 1; > + opt.flags.recursive = 1; > opt.repo = the_repository; > + opt.change = diff_change; > + opt.add_remove = diff_addremove; > + > + /* > + * When pathspec is given for resetting a cone-mode sparse checkout, it may > + * identify entries that are nested in sparse directories, in which case the > + * index should be expanded. For the sake of efficiency, this check is > + * overly-cautious: anything with a wildcard or a magic prefix requires > + * expansion, as well as literal paths that aren't in the sparse checkout > + * definition AND don't match any directory in the index. s/efficiency/efficiency of checking/ ? Being overly-cautious suggests you'll expand to a full index more than is needed, and full indexes are more expensive. But perhaps the checking would be expensive too so you have a tradeoff? Or maybe s/efficiency/simplicity/? > + */ > + if (pathspec->nr && the_index.sparse_index) { > + if (pathspec->magic || pathspec->has_wildcard) { > + ensure_full_index(&the_index); dir.c has the notion of matching the characters preceding the wildcard characters; look for "no_wildcard_len". If the pathspec doesn't match a path up to no_wildcard_len, then the wildcard character(s) later in the pathspec can't make the pathspec match that path. It might at least be worth mentioning this as a possible future optimization. > + } else { > + for (i = 0; i < pathspec->nr; i++) { > + if (!path_in_cone_mode_sparse_checkout(pathspec->items[i].original, &the_index) && > + !matches_skip_worktree(pathspec, i, &skip_worktree_seen)) { What if the pathspec corresponds to a sparse-directory in the index, but possibly without the trailing '/' character? e.g.: git reset HEAD~1 -- sparse-directory One should be able to reset that directory without recursing into it...does this code handle that? Does it handle it if we add the trailing slash on the path for the reset command line? > + ensure_full_index(&the_index); > + break; > + } > + } > + } > + } > + > + free(skip_worktree_seen); > > - ensure_full_index(&the_index); > if (do_diff_cache(tree_oid, &opt)) > return 1; > diffcore_std(&opt); > diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh > index e301ef5633a..4afcbc2d673 100755 > --- a/t/t1092-sparse-checkout-compatibility.sh > +++ b/t/t1092-sparse-checkout-compatibility.sh > @@ -804,11 +804,22 @@ test_expect_success 'sparse-index is not expanded' ' > ensure_not_expanded reset --hard $ref || return 1 > done && > > + ensure_not_expanded reset --mixed base && > ensure_not_expanded reset --hard update-deep && > ensure_not_expanded reset --keep base && > ensure_not_expanded reset --merge update-deep && > - ensure_not_expanded reset --hard && This commit was only touching the --mixed case; why is it removing one of the tests for --hard? > > + ensure_not_expanded reset base -- deep/a && > + ensure_not_expanded reset base -- nonexistent-file && > + ensure_not_expanded reset deepest -- deep && > + > + # Although folder1 is outside the sparse definition, it exists as a > + # directory entry in the index, so it will be reset without needing to > + # expand the full index. Ah, I think this answers one of my earlier questions. Does it also work with 'folder1/' as well as 'folder1'? > + ensure_not_expanded reset --hard update-folder1 && Wait...is update-folder1 a branch or a path? And if this commit is about --mixed, why are --hard testcases being added? > + ensure_not_expanded reset base -- folder1 && > + > + ensure_not_expanded reset --hard update-deep && another --hard testcase...was this an accidental squash by chance? > ensure_not_expanded checkout -f update-deep && > test_config -C sparse-index pull.twohead ort && > ( > -- > gitgitgadget >
Elijah Newren wrote: >> + /* >> + * When pathspec is given for resetting a cone-mode sparse checkout, it may >> + * identify entries that are nested in sparse directories, in which case the >> + * index should be expanded. For the sake of efficiency, this check is >> + * overly-cautious: anything with a wildcard or a magic prefix requires >> + * expansion, as well as literal paths that aren't in the sparse checkout >> + * definition AND don't match any directory in the index. > > s/efficiency/efficiency of checking/ ? Being overly-cautious suggests > you'll expand to a full index more than is needed, and full indexes > are more expensive. But perhaps the checking would be expensive too > so you have a tradeoff? > > Or maybe s/efficiency/simplicity/? > "Simplicity" is probably more appropriate, although the original intent was "efficiency of checking". I wanted to avoid repeated iteration over the index (for example, matching the `no_wildcard_len` of each wildcard pathspec item against each sparse directory in the index). However, to your point, expanding the index is a more expensive operation anyway, so it's probably worth the more involved checks. >> + */ >> + if (pathspec->nr && the_index.sparse_index) { >> + if (pathspec->magic || pathspec->has_wildcard) { >> + ensure_full_index(&the_index); > > dir.c has the notion of matching the characters preceding the wildcard > characters; look for "no_wildcard_len". If the pathspec doesn't match > a path up to no_wildcard_len, then the wildcard character(s) later in > the pathspec can't make the pathspec match that path. > > It might at least be worth mentioning this as a possible future optimization. > I'll incorporate a something like this into the next version. >> + } else { >> + for (i = 0; i < pathspec->nr; i++) { >> + if (!path_in_cone_mode_sparse_checkout(pathspec->items[i].original, &the_index) && >> + !matches_skip_worktree(pathspec, i, &skip_worktree_seen)) { > > What if the pathspec corresponds to a sparse-directory in the index, > but possibly without the trailing '/' character? e.g.: > > git reset HEAD~1 -- sparse-directory > > One should be able to reset that directory without recursing into > it...does this code handle that? Does it handle it if we add the > trailing slash on the path for the reset command line? > It handles both cases (with and without trailing slash), the former due to `!matches_skip_worktree(...)` and the latter due to `!path_in_cone_mode_sparse_checkout(...)`. >> + ensure_full_index(&the_index); >> + break; >> + } >> + } >> + } >> + } >> + >> + free(skip_worktree_seen); >> >> - ensure_full_index(&the_index); >> if (do_diff_cache(tree_oid, &opt)) >> return 1; >> diffcore_std(&opt); >> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh >> index e301ef5633a..4afcbc2d673 100755 >> --- a/t/t1092-sparse-checkout-compatibility.sh >> +++ b/t/t1092-sparse-checkout-compatibility.sh >> @@ -804,11 +804,22 @@ test_expect_success 'sparse-index is not expanded' ' >> ensure_not_expanded reset --hard $ref || return 1 >> done && >> >> + ensure_not_expanded reset --mixed base && >> ensure_not_expanded reset --hard update-deep && >> ensure_not_expanded reset --keep base && >> ensure_not_expanded reset --merge update-deep && >> - ensure_not_expanded reset --hard && > > This commit was only touching the --mixed case; why is it removing one > of the tests for --hard? > [...] >> + ensure_not_expanded reset --hard update-folder1 && > > Wait...is update-folder1 a branch or a path? And if this commit is > about --mixed, why are --hard testcases being added? > >> + ensure_not_expanded reset base -- folder1 && >> + >> + ensure_not_expanded reset --hard update-deep && > > another --hard testcase...was this an accidental squash by chance? > I included `git reset --hard` between the "actual" test cases so that the `git reset --mixed` tests would start in a "clean" state (clear out any modified files), but it's unnecessary in most cases so I'll remove them in V3. To answer your other question, `update-folder1` is a branch.
diff --git a/builtin/reset.c b/builtin/reset.c index e1f2a2bb2c4..ceb9b122897 100644 --- a/builtin/reset.c +++ b/builtin/reset.c @@ -175,6 +175,8 @@ static int read_from_tree(const struct pathspec *pathspec, int intent_to_add) { struct diff_options opt; + unsigned int i; + char *skip_worktree_seen = NULL; memset(&opt, 0, sizeof(opt)); copy_pathspec(&opt.pathspec, pathspec); @@ -182,9 +184,35 @@ static int read_from_tree(const struct pathspec *pathspec, opt.format_callback = update_index_from_diff; opt.format_callback_data = &intent_to_add; opt.flags.override_submodule_config = 1; + opt.flags.recursive = 1; opt.repo = the_repository; + opt.change = diff_change; + opt.add_remove = diff_addremove; + + /* + * When pathspec is given for resetting a cone-mode sparse checkout, it may + * identify entries that are nested in sparse directories, in which case the + * index should be expanded. For the sake of efficiency, this check is + * overly-cautious: anything with a wildcard or a magic prefix requires + * expansion, as well as literal paths that aren't in the sparse checkout + * definition AND don't match any directory in the index. + */ + if (pathspec->nr && the_index.sparse_index) { + if (pathspec->magic || pathspec->has_wildcard) { + ensure_full_index(&the_index); + } else { + for (i = 0; i < pathspec->nr; i++) { + if (!path_in_cone_mode_sparse_checkout(pathspec->items[i].original, &the_index) && + !matches_skip_worktree(pathspec, i, &skip_worktree_seen)) { + ensure_full_index(&the_index); + break; + } + } + } + } + + free(skip_worktree_seen); - ensure_full_index(&the_index); if (do_diff_cache(tree_oid, &opt)) return 1; diffcore_std(&opt); diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index e301ef5633a..4afcbc2d673 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -804,11 +804,22 @@ test_expect_success 'sparse-index is not expanded' ' ensure_not_expanded reset --hard $ref || return 1 done && + ensure_not_expanded reset --mixed base && ensure_not_expanded reset --hard update-deep && ensure_not_expanded reset --keep base && ensure_not_expanded reset --merge update-deep && - ensure_not_expanded reset --hard && + ensure_not_expanded reset base -- deep/a && + ensure_not_expanded reset base -- nonexistent-file && + ensure_not_expanded reset deepest -- deep && + + # Although folder1 is outside the sparse definition, it exists as a + # directory entry in the index, so it will be reset without needing to + # expand the full index. + ensure_not_expanded reset --hard update-folder1 && + ensure_not_expanded reset base -- folder1 && + + ensure_not_expanded reset --hard update-deep && ensure_not_expanded checkout -f update-deep && test_config -C sparse-index pull.twohead ort && (