Message ID | 22c69bc60308fef13acd7c3aab4e11e175c89440.1633440057.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Sparse Index: integrate with reset | expand |
"Kevin Willford via GitGitGadget" <gitgitgadget@gmail.com> writes: > When using the sparse checkout feature, 'git reset' will add entries to > the index that will have the skip-worktree bit off but will leave the > working directory empty. File data is lost because the index version of > the files has been changed but there is nothing that is in the working > directory. This will cause the next 'git status' call to show either > deleted for files modified or deleting or nothing for files added. The > added files should be shown as untracked and modified files should be > shown as modified. I am on vacation today, so let me be brief. Let me see if I am understanding the situation correctly. We have the index, with a path that records a blob, but the path is marked with skip-wortree bit. $ rm -fr test && mkdir test && cd test $ git init . $ date >no-skip $ date >skip $ git add no-skip skip $ git commit -m initial 2 files changed, 2 insertions(+) create mode 100644 no-skip create mode 100644 skip $ date >no-skip $ date >skip $ git add no-skip skip $ git update-index --skip-wortree skip $ rm skip $ git commit -m second [master e9088ad] second 2 files changed, 2 insertions(+), 2 deletions(-) $ ls *skip no-skip $ git ls-files -t H no-skip S skip $ git status On branch master nothing to commit, working tree clean Note. There is no 'reset' done yet so far. The user is happy with the state because (1) The user marked the path "skip" with skip-worktree bit, and thanks to that, even though "skip" is absent in the working tree, the "git status" does not complain. (2) The user marked the path "skip" with skip-worktree bit because the user did not want to see such a file in the working tree. And "git commit -m second", "git ls-files -t", or "git status" that were done to get here did not make it materialize in the working tree all of sudden. And then the user says "git reset HEAD^" to switch to a different commit. $ git reset HEAD^ $ ls *skip no-skip $ git ls-files -t M no-skip D skip $ git status -suno M no-skip D skip The user is unhappy with the state because "skip" is shown as lost. Do I understand the situation you are trying to deal with correctly? > To fix this when the reset is running if there is not a file in the > working directory and if it will be missing with the new index entry or > was not missing in the previous version, we create the previous index > version of the file in the working directory so that status will report > correctly and the files will be availble for the user to deal with. Assuming I read the problem description correctly, I am highly skeptical that the above is a correct approach to keep the user happy. Yes, if you created a working tree file with contents that match the blob recorded for the path in the initial commit when "reset HEAD^" is done, you may keep "git status" quiet, so (1) above will be kept, but what about (2)? The user marked the path with "skip" but, because the path should not appear on the working tree. The "fix" is countermanding that wish by the user, isn't it? Wouldn't a fix to the situation be to * Add the blob for "skip" taken from the initial commit to the index, just like the entry for "no-skip" is updated; * But remember that "skip" was marked with "skip-worktree" bit immediately before "git reset" was asked to do its thing, and re-add the bit to the path in the index before "git reset" gives the control back to the usre; * And keep the working tree untouched, without writing anything out to "skip". If the user had a (possibly unrelated) file there, it will not be overwritten, and if the user left the path absent, it will still be absent. so that the last three diagnostic commands in the above sample sequence would instead read: $ ls *skip no-skip $ git ls-files -t M no-skip S skip $ git status -suno M no-skip i.e. skip gets updated in the index only, nothing changes in the working tree for "skip" or "no-skip", and status reports that "no-skip" is different from the index but "skip" hasn't changed in the working tree since the index (thanks to its skip-worktree bit). Then the user will be happy in the same way as the user was happy immediately after the state marked with "There is no 'reset' done yet so far." above, on both counts, not just for "status does not report something got changed" part but also "user didn't want to see 'skip' in the working tree, and 'skip' did not materialize" part. Thanks.
Junio C Hamano wrote: > Wouldn't a fix to the situation be to > > * Add the blob for "skip" taken from the initial commit to the > index, just like the entry for "no-skip" is updated; > > * But remember that "skip" was marked with "skip-worktree" bit > immediately before "git reset" was asked to do its thing, and > re-add the bit to the path in the index before "git reset" gives > the control back to the usre; > > * And keep the working tree untouched, without writing anything out > to "skip". If the user had a (possibly unrelated) file there, it > will not be overwritten, and if the user left the path absent, it > will still be absent. > > so that the last three diagnostic commands in the above sample > sequence would instead read: > > $ ls *skip > no-skip > $ git ls-files -t > M no-skip > S skip > $ git status -suno > M no-skip > > i.e. skip gets updated in the index only, nothing changes in the > working tree for "skip" or "no-skip", and status reports that > "no-skip" is different from the index but "skip" hasn't changed in > the working tree since the index (thanks to its skip-worktree bit). > > Then the user will be happy in the same way as the user was happy > immediately after the state marked with "There is no 'reset' done > yet so far." above, on both counts, not just for "status does not > report something got changed" part but also "user didn't want to see > 'skip' in the working tree, and 'skip' did not materialize" part. > > Thanks. > Thanks for the thorough explanation, I'm on-board with your approach (and will re-roll the series with that implemented). A lot of my thought process (and confusion) came from a comment in e5ca291076 (t1092: document bad sparse-checkout behavior, 2021-07-14) suggesting that full and sparse checkouts should have the same result in scenarios like the one you outlined above. The problem is, as noted earlier, it's impossible to tell whether (using your example): 1. the user deleted `skip` because they intentionally want to remove it from the worktree, and it should continue to be deleted after a reset. 2. `skip` doesn't exist in the worktree because it's excluded from the sparse checkout definition and the user does not want its current state "deleted" after a reset. As a result, there's no way `git reset --mixed` could be expected to behave the same way in full checkouts as it does in sparse, and the most consistent solution is that the worktree should remain untouched with `skip-worktree` preserved.
Hi! It appears Junio has already commented on this patch and in more detail, but since I already typed up some comments I'll send them along in case they are useful. On Tue, Oct 5, 2021 at 6:20 AM Kevin Willford via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Kevin Willford <kewillf@microsoft.com> > > When using the sparse checkout feature, 'git reset' will add entries to > the index that will have the skip-worktree bit off but will leave the > working directory empty. Yes, that seems like a problem. > File data is lost because the index version of > the files has been changed but there is nothing that is in the working > directory. This will cause the next 'git status' call to show either > deleted for files modified or deleting or nothing for files added. The > added files should be shown as untracked and modified files should be > shown as modified. Why is the solution to add the files to the working tree rather than to make sure the files have the skip-worktree bit set? That's not at all what I would have expected. > To fix this when the reset is running if there is not a file in the > working directory and if it will be missing with the new index entry or > was not missing in the previous version, we create the previous index > version of the file in the working directory so that status will report > correctly and the files will be availble for the user to deal with. s/availble/available/ > > This fixes a documented failure from t1092 that was created in 19a0acc > (t1092: test interesting sparse-checkout scenarios, 2021-01-23). > > Signed-off-by: Kevin Willford <kewillf@microsoft.com> > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > Signed-off-by: Victoria Dye <vdye@github.com> > --- > builtin/reset.c | 24 ++++++++-- > t/t1092-sparse-checkout-compatibility.sh | 4 +- > t/t7114-reset-sparse-checkout.sh | 61 ++++++++++++++++++++++++ > 3 files changed, 83 insertions(+), 6 deletions(-) > create mode 100755 t/t7114-reset-sparse-checkout.sh > > diff --git a/builtin/reset.c b/builtin/reset.c > index 51c9e2f43ff..3b75d3b2f20 100644 > --- a/builtin/reset.c > +++ b/builtin/reset.c > @@ -25,6 +25,8 @@ > #include "cache-tree.h" > #include "submodule.h" > #include "submodule-config.h" > +#include "dir.h" > +#include "entry.h" > > #define REFRESH_INDEX_DELAY_WARNING_IN_MS (2 * 1000) > > @@ -130,11 +132,27 @@ static void update_index_from_diff(struct diff_queue_struct *q, > int intent_to_add = *(int *)data; > > for (i = 0; i < q->nr; i++) { > + int pos; > struct diff_filespec *one = q->queue[i]->one; > - int is_missing = !(one->mode && !is_null_oid(&one->oid)); > + struct diff_filespec *two = q->queue[i]->two; > + int is_in_reset_tree = one->mode && !is_null_oid(&one->oid); Isn't !is_null_oid(&one->oid) redundant to checking one->mode? When does the diff machinery ever give you a non-zero mode with a null oid? Also, is_in_reset_tree == !is_missing; I'll note that below. > struct cache_entry *ce; > > + /* > + * If the file being reset has `skip-worktree` enabled, we need > + * to check it out to prevent the file from being hard reset. I don't understand this comment. If the file wasn't originally in the index (is_missing), and is being added to it, and is correctly marked as skip_worktree, and the file isn't in the working tree, then it sounds like everything is already in a good state. Files outside the sparse checkout are meant to have the skip_worktree bit set and be missing from the working tree. Also, I don't know what you mean by 'hard reset' here. > + */ > + pos = cache_name_pos(two->path, strlen(two->path)); > + if (pos >= 0 && ce_skip_worktree(active_cache[pos])) { > + struct checkout state = CHECKOUT_INIT; > + state.force = 1; > + state.refresh_cache = 1; > + state.istate = &the_index; > + > + checkout_entry(active_cache[pos], &state, NULL, NULL); Does this introduce an error in the opposite direction from the one stated in the commit message? Namely we have two things that should be in sync: the skip_worktree flag stating whether the file should be present in the working directory (skip_worktree), and the question of whether the file is actually in the working directory. In the commit message, you pointed out a case where the y were out of sync one way: the skip_worktree flag was not set but the file was missing. Here you say the skip_worktree flag is set, but you add it to the working tree anyway. Or am I misunderstanding the code? > + } > + [I did some slight editing to the diff to make the next two parts appear next to each other] > - if (is_missing && !intent_to_add) { > + if (!is_in_reset_tree && !intent_to_add) { I thought this was some subtle bugfix or something, and spent a while trying to figure it out, before realizing that is_in_reset_tree was simply defined as !is_missing (for some reason I was assuming it was dealing with two->mode while is_missing was looking at one->mode). So this is a simple variable renaming, which I think is probably good, but I'd prefer if this was separated into a different patch to make it easier to review. > remove_file_from_cache(one->path); > continue; > } > @@ -144,7 +162,7 @@ static void update_index_from_diff(struct diff_queue_struct *q, > if (!ce) > die(_("make_cache_entry failed for path '%s'"), > one->path); > - if (is_missing) { > + if (!is_in_reset_tree) { same note as above; the variable rename is good, but should be a separate patch. > ce->ce_flags |= CE_INTENT_TO_ADD; > set_object_name_for_intent_to_add_entry(ce); > } > diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh > index 886e78715fe..c5977152661 100755 > --- a/t/t1092-sparse-checkout-compatibility.sh > +++ b/t/t1092-sparse-checkout-compatibility.sh > @@ -459,9 +459,7 @@ test_expect_failure 'blame with pathspec outside sparse definition' ' > test_all_match git blame deep/deeper2/deepest/a > ' > > -# NEEDSWORK: a sparse-checkout behaves differently from a full checkout > -# in this scenario, but it shouldn't. > -test_expect_failure 'checkout and reset (mixed)' ' > +test_expect_success 'checkout and reset (mixed)' ' > init_repos && > > test_all_match git checkout -b reset-test update-deep && > diff --git a/t/t7114-reset-sparse-checkout.sh b/t/t7114-reset-sparse-checkout.sh > new file mode 100755 > index 00000000000..a8029707fb1 > --- /dev/null > +++ b/t/t7114-reset-sparse-checkout.sh > @@ -0,0 +1,61 @@ > +#!/bin/sh > + > +test_description='reset when using a sparse-checkout' > + > +. ./test-lib.sh > + > +test_expect_success 'setup' ' > + test_tick && > + echo "checkout file" >c && > + echo "modify file" >m && > + echo "delete file" >d && > + git add . && > + git commit -m "initial commit" && > + echo "added file" >a && > + echo "modification of a file" >m && > + git rm d && > + git add . && > + git commit -m "second commit" && > + git checkout -b endCommit > +' > + > +test_expect_success 'reset when there is a sparse-checkout' ' > + echo "/c" >.git/info/sparse-checkout && > + test_config core.sparsecheckout true && > + git checkout -B resetBranch && > + test_path_is_missing m && > + test_path_is_missing a && > + test_path_is_missing d && > + git reset HEAD~1 && > + echo "checkout file" >expect && > + test_cmp expect c && > + echo "added file" >expect && > + test_cmp expect a && > + echo "modification of a file" >expect && > + test_cmp expect m && > + test_path_is_missing d > +' > + > +test_expect_success 'reset after deleting file without skip-worktree bit' ' > + git checkout -f endCommit && > + git clean -xdf && > + cat >.git/info/sparse-checkout <<-\EOF && > + /c > + /m > + EOF > + test_config core.sparsecheckout true && > + git checkout -B resetAfterDelete && > + test_path_is_file m && > + test_path_is_missing a && > + test_path_is_missing d && > + rm -f m && > + git reset HEAD~1 && > + echo "checkout file" >expect && > + test_cmp expect c && > + echo "added file" >expect && > + test_cmp expect a && > + test_path_is_missing m && > + test_path_is_missing d > +' > + > +test_done > -- > gitgitgadget >
On 05/10/21 20.20, Kevin Willford via GitGitGadget wrote: > When using the sparse checkout feature, 'git reset' will add entries to > the index that will have the skip-worktree bit off but will leave the > working directory empty. File data is lost because the index version of > the files has been changed but there is nothing that is in the working > directory. This will cause the next 'git status' call to show either > deleted for files modified or deleting or nothing for files added. The > added files should be shown as untracked and modified files should be > shown as modified. > Better say `... but there is nothing in the working directory`. > To fix this when the reset is running if there is not a file in the > working directory and if it will be missing with the new index entry or > was not missing in the previous version, we create the previous index > version of the file in the working directory so that status will report > correctly and the files will be availble for the user to deal with. > s/availble/available
Victoria Dye <vdye@github.com> writes: > Thanks for the thorough explanation, I'm on-board with your approach (and > will re-roll the series with that implemented). A lot of my thought process > (and confusion) came from a comment in e5ca291076 (t1092: document bad > sparse-checkout behavior, 2021-07-14) suggesting that full and sparse > checkouts should have the same result in scenarios like the one you > outlined above. Thanks for bringing this up. I agree that it is crucial to clarify what use case we are aiming for. If the objective were to make a sparse checkout behave just like full checkout, the desired behaviour would be very different from a system whose objective is to allow users to pretend as if the hidden parts of sparse checkout do not even exist, which was the model my example was after. I agree with you that the "comment" in an earlier commit may have been unhelpful in that they stopped at "should behave the same but they shouldn't" without saying "why they should behave the same". If the goal were to make sparse behave like full, continuing with the previous example, after a $ git reset --mixed HEAD^ the user should be able to say $ git commit -a --amend to replace the original two-commit history with a single commit history that records the same resulting tree. If the path "skip" were to be reset to the blob from the first commit, just like the path "no-skip" is, for such a "commit -a --amend" to work, we would need to have a working tree file for "skip" magically materialized with the contents from the *second* commit. After all, the whole point of mixed (and soft) reset is that they do not (logically) change the files in the working tree, so if you are resetting from the second commit to the first, if you were to have a working tree file, it should come from the second commit, so that both "skip" and "no-skip" should show "changed in the working tree relative to the index", i.e. $ git reset --mixed HEAD^ $ git ls-files -t M no-skip M skip While such a "make sparse behave the same way as full" can be made internally consistent, however, as the above example shows, it would make the resulting "sparse checkout" practically unusable. By stepping back a bit and realizing that the reason why the user wanted to mark some path as "skip-worktree" was because the user had no intention to make any change to them, we can make it usable again, by not insisting that sparse should behave the same way as full. When we redesign these patches, I would like to see what we failed short the last time gets improved. Instead of saying "skip-worktree entries should stay so" and stopping there, we should leave a note for later readers to explain why they should. Thanks.
Elijah Newren wrote: >> - int is_missing = !(one->mode && !is_null_oid(&one->oid)); >> + struct diff_filespec *two = q->queue[i]->two; >> + int is_in_reset_tree = one->mode && !is_null_oid(&one->oid); > > Isn't !is_null_oid(&one->oid) redundant to checking one->mode? When > does the diff machinery ever give you a non-zero mode with a null oid? > It looks like this originally only checked the mode, and the extra OID check was introduced in ff00b682f2 (reset [<commit>] paths...: do not mishandle unmerged paths, 2011-07-13). I was able to remove `!is_null_oid(&one->oid)` from the condition and run the `t71*` tests without any failures, but I'm hesitant to remove it on the off chance that this handles a case I'm not thinking of. > Also, is_in_reset_tree == !is_missing; I'll note that below. > >> struct cache_entry *ce; >> >> + /* >> + * If the file being reset has `skip-worktree` enabled, we need >> + * to check it out to prevent the file from being hard reset. > > I don't understand this comment. If the file wasn't originally in the > index (is_missing), and is being added to it, and is correctly marked > as skip_worktree, and the file isn't in the working tree, then it > sounds like everything is already in a good state. Files outside the > sparse checkout are meant to have the skip_worktree bit set and be > missing from the working tree. > > Also, I don't know what you mean by 'hard reset' here. > >> + */ >> + pos = cache_name_pos(two->path, strlen(two->path)); >> + if (pos >= 0 && ce_skip_worktree(active_cache[pos])) { >> + struct checkout state = CHECKOUT_INIT; >> + state.force = 1; >> + state.refresh_cache = 1; >> + state.istate = &the_index; >> + >> + checkout_entry(active_cache[pos], &state, NULL, NULL); > > Does this introduce an error in the opposite direction from the one > stated in the commit message? Namely we have two things that should > be in sync: the skip_worktree flag stating whether the file should be > present in the working directory (skip_worktree), and the question of > whether the file is actually in the working directory. In the commit > message, you pointed out a case where the y were out of sync one way: > the skip_worktree flag was not set but the file was missing. Here you > say the skip_worktree flag is set, but you add it to the working tree > anyway. > > Or am I misunderstanding the code? > Most of this is addressed in [1], and you're right that what's in this patch isn't the right fix for the problem. This patch tried to solve the issue of "skip-worktree is being ignored and reset files are showing up deleted" by continuing to ignore `skip-worktree`, but now checking out the `skip-worktree` files based on their pre-reset state in the index (unless they, for some reason, were already present in the worktree). However, that completely disregards the reasoning for having `skip-worktree` in the first place (the user wants the file *ignored* in the worktree) and violates the premise of `git reset --mixed` not modifying the worktree, so the better solution is to set `skip-worktree` in the resulting index entry and not check out anything. [1] https://lore.kernel.org/git/9b99e856-24cc-03fd-7871-de92dc6e39b6@github.com/ >> + } >> + > > [I did some slight editing to the diff to make the next two parts > appear next to each other] > >> - if (is_missing && !intent_to_add) { >> + if (!is_in_reset_tree && !intent_to_add) { > > I thought this was some subtle bugfix or something, and spent a while > trying to figure it out, before realizing that is_in_reset_tree was > simply defined as !is_missing (for some reason I was assuming it was > dealing with two->mode while is_missing was looking at one->mode). So > this is a simple variable renaming, which I think is probably good, > but I'd prefer if this was separated into a different patch to make it > easier to review. > Good call, I'll include this in V3.
diff --git a/builtin/reset.c b/builtin/reset.c index 51c9e2f43ff..3b75d3b2f20 100644 --- a/builtin/reset.c +++ b/builtin/reset.c @@ -25,6 +25,8 @@ #include "cache-tree.h" #include "submodule.h" #include "submodule-config.h" +#include "dir.h" +#include "entry.h" #define REFRESH_INDEX_DELAY_WARNING_IN_MS (2 * 1000) @@ -130,11 +132,27 @@ static void update_index_from_diff(struct diff_queue_struct *q, int intent_to_add = *(int *)data; for (i = 0; i < q->nr; i++) { + int pos; struct diff_filespec *one = q->queue[i]->one; - int is_missing = !(one->mode && !is_null_oid(&one->oid)); + struct diff_filespec *two = q->queue[i]->two; + int is_in_reset_tree = one->mode && !is_null_oid(&one->oid); struct cache_entry *ce; - if (is_missing && !intent_to_add) { + /* + * If the file being reset has `skip-worktree` enabled, we need + * to check it out to prevent the file from being hard reset. + */ + pos = cache_name_pos(two->path, strlen(two->path)); + if (pos >= 0 && ce_skip_worktree(active_cache[pos])) { + struct checkout state = CHECKOUT_INIT; + state.force = 1; + state.refresh_cache = 1; + state.istate = &the_index; + + checkout_entry(active_cache[pos], &state, NULL, NULL); + } + + if (!is_in_reset_tree && !intent_to_add) { remove_file_from_cache(one->path); continue; } @@ -144,7 +162,7 @@ static void update_index_from_diff(struct diff_queue_struct *q, if (!ce) die(_("make_cache_entry failed for path '%s'"), one->path); - if (is_missing) { + if (!is_in_reset_tree) { ce->ce_flags |= CE_INTENT_TO_ADD; set_object_name_for_intent_to_add_entry(ce); } diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 886e78715fe..c5977152661 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -459,9 +459,7 @@ test_expect_failure 'blame with pathspec outside sparse definition' ' test_all_match git blame deep/deeper2/deepest/a ' -# NEEDSWORK: a sparse-checkout behaves differently from a full checkout -# in this scenario, but it shouldn't. -test_expect_failure 'checkout and reset (mixed)' ' +test_expect_success 'checkout and reset (mixed)' ' init_repos && test_all_match git checkout -b reset-test update-deep && diff --git a/t/t7114-reset-sparse-checkout.sh b/t/t7114-reset-sparse-checkout.sh new file mode 100755 index 00000000000..a8029707fb1 --- /dev/null +++ b/t/t7114-reset-sparse-checkout.sh @@ -0,0 +1,61 @@ +#!/bin/sh + +test_description='reset when using a sparse-checkout' + +. ./test-lib.sh + +test_expect_success 'setup' ' + test_tick && + echo "checkout file" >c && + echo "modify file" >m && + echo "delete file" >d && + git add . && + git commit -m "initial commit" && + echo "added file" >a && + echo "modification of a file" >m && + git rm d && + git add . && + git commit -m "second commit" && + git checkout -b endCommit +' + +test_expect_success 'reset when there is a sparse-checkout' ' + echo "/c" >.git/info/sparse-checkout && + test_config core.sparsecheckout true && + git checkout -B resetBranch && + test_path_is_missing m && + test_path_is_missing a && + test_path_is_missing d && + git reset HEAD~1 && + echo "checkout file" >expect && + test_cmp expect c && + echo "added file" >expect && + test_cmp expect a && + echo "modification of a file" >expect && + test_cmp expect m && + test_path_is_missing d +' + +test_expect_success 'reset after deleting file without skip-worktree bit' ' + git checkout -f endCommit && + git clean -xdf && + cat >.git/info/sparse-checkout <<-\EOF && + /c + /m + EOF + test_config core.sparsecheckout true && + git checkout -B resetAfterDelete && + test_path_is_file m && + test_path_is_missing a && + test_path_is_missing d && + rm -f m && + git reset HEAD~1 && + echo "checkout file" >expect && + test_cmp expect c && + echo "added file" >expect && + test_cmp expect a && + test_path_is_missing m && + test_path_is_missing d +' + +test_done