mbox series

[v2,0/5] Sparse Index: Integrate with 'git add'

Message ID pull.999.v2.git.1627312727.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Sparse Index: Integrate with 'git add' | expand

Message

Linus Arver via GitGitGadget July 26, 2021, 3:18 p.m. UTC
This patch series re-submits the 'git add' integration with sparse-index.
The performance gains are the same as before.

It is based on ds/commit-and-checkout-with-sparse-index.

This series was delayed from its initial submission for a couple reasons.

The first was because it was colliding with some changes in
mt/add-rm-in-sparse-checkout, so now we are far enough along that that
branch is in our history and we can work forwards.

The other concern was about how 'git add ' should respond when a path
outside of the sparse-checkout cone exists. One recommendation (that I am
failing to find a link to the message, sorry) was to disallow adding files
that would become index entries with SKIP_WORKTREE on. However, as I worked
towards that goal I found that change would cause problems for a realistic
scenario: merge conflicts outside of the sparse-checkout cone.

Update: Elijah points out that the SKIP_WORKTREE bit is removed from
conflict files, which allows adding the conflicted files without warning.
(However, we also need to be careful about untracked files, as documented in
the test added here.)

The first patch of this series adds tests that create merge conflicts
outside of the sparse cone and then presents different ways a user could
resolve the situation. We want all of them to be feasible, and this
includes:

 1. Reverting the file to a known version in history.
 2. Adding the file with its contents on disk.
 3. Moving the file to a new location in the sparse directory.

The one place I did continue to update is 'git add --refresh ' to match the
behavior added by mt/add-rm-in-sparse-checkout which outputs an error
message. This happens even when the file exists in the working directory,
but that seems appropriate enough.


Updates in V2
=============

 * Test comments in patch 1 are improved.

 * The test hunk that was removed in patch 2 and reintroduced in the old
   patch 4 is modified to clarify how the behavior changes with that patch.
   Then, the test is modified by future patches.

 * Another instance of ensure_full_index() is removed from the --renormalize
   option. This option already ignored files with the SKIP_WORKTREE bit, so
   this should be an obviously-correct removal.

 * a full proposal for what to do with "git (add|mv|rm)" and paths outside
   the cone is delayed to another series (with an RFC round) because the
   behavior of the sparse-index matches a full index with sparse-checkout.

Thanks, -Stolee

Derrick Stolee (5):
  t1092: test merge conflicts outside cone
  add: allow operating on a sparse-only index
  pathspec: stop calling ensure_full_index
  add: ignore outside the sparse-checkout in refresh()
  add: remove ensure_full_index() with --renormalize

 builtin/add.c                            | 15 ++++--
 pathspec.c                               |  2 -
 t/t1092-sparse-checkout-compatibility.sh | 62 ++++++++++++++++++++----
 3 files changed, 65 insertions(+), 14 deletions(-)


base-commit: 71e301501c88399711a1bf8515d1747e92cfbb9b
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-999%2Fderrickstolee%2Fsparse-index%2Fadd-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-999/derrickstolee/sparse-index/add-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/999

Range-diff vs v1:

 1:  a763a7d15b8 ! 1:  8f2fd9370fe t1092: test merge conflicts outside cone
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'merge' '
      +	test_all_match test_must_fail git merge -m merge merge-right &&
      +	test_all_match git status --porcelain=v2 &&
      +
     -+	# resolve the conflict in different ways:
     -+	# 1. revert to the base
     ++	# Resolve the conflict in different ways:
     ++	# 1. Revert to the base
      +	test_all_match git checkout base -- deep/deeper2/a &&
      +	test_all_match git status --porcelain=v2 &&
      +
     -+	# 2. add the file with conflict markers
     ++	# 2. Add the file with conflict markers
      +	test_all_match git add folder1/a &&
      +	test_all_match git status --porcelain=v2 &&
      +
     -+	# 3. rename the file to another sparse filename
     ++	# 3. Rename the file to another sparse filename and
     ++	#    accept conflict markers as resolved content.
      +	run_on_all mv folder2/a folder2/z &&
      +	test_all_match git add folder2 &&
      +	test_all_match git status --porcelain=v2 &&
 2:  791c6c2c9ad ! 2:  6e43f118fa0 add: allow operating on a sparse-only index
     @@ Commit message
          sparse-index. Comparing to the full index case, 'git add -A' goes from
          0.37s to 0.05s, which is "only" an 86% improvement.
      
     +    This modification to 'git add' creates some behavior change depending on
     +    the use of a sparse index. We modify a test in t1092 to demonstrate
     +    these changes which will be remedied in future changes.
     +
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## builtin/add.c ##
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'status/add: outsi
      -	# This "git add folder1/a" fails with a warning
      -	# in the sparse repos, differing from the full
      -	# repo. This is intentional.
     --	test_sparse_match test_must_fail git add folder1/a &&
     ++	# Adding the path outside of the sparse-checkout cone should fail.
     + 	test_sparse_match test_must_fail git add folder1/a &&
      -	test_sparse_match test_must_fail git add --refresh folder1/a &&
      -	test_all_match git status --porcelain=v2 &&
     --
     ++
     ++	test_must_fail git -C sparse-checkout add --refresh folder1/a 2>sparse-checkout-err &&
     ++	test_must_fail git -C sparse-index add --refresh folder1/a 2>sparse-index-err &&
     ++	# NEEDSWORK: A sparse index changes the error message.
     ++	! test_cmp sparse-checkout-err sparse-index-err &&
     ++
     ++	# NEEDSWORK: Adding a newly-tracked file outside the cone succeeds
     ++	test_sparse_match git add folder1/new &&
     + 
       	test_all_match git add . &&
       	test_all_match git status --porcelain=v2 &&
       	test_all_match git commit -m folder1/new &&
     ++	test_all_match git rev-parse HEAD^{tree} &&
     + 
     + 	run_on_all ../edit-contents folder1/newer &&
     + 	test_all_match git add folder1/ &&
     + 	test_all_match git status --porcelain=v2 &&
     +-	test_all_match git commit -m folder1/newer
     ++	test_all_match git commit -m folder1/newer &&
     ++	test_all_match git rev-parse HEAD^{tree}
     + '
     + 
     + test_expect_success 'checkout and reset --hard' '
      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is not expanded' '
       	git -C sparse-index reset --hard &&
       	ensure_not_expanded checkout rename-out-to-out -- deep/deeper1 &&
 3:  a577ea4c74d = 3:  2ae91e0af29 pathspec: stop calling ensure_full_index
 4:  89ec6a7ce67 < -:  ----------- t1092: 'git add --refresh' difference with sparse-index
 5:  76066a78ce0 ! 4:  a79728d4c64 add: ignore outside the sparse-checkout in refresh()
     @@ builtin/add.c: static int refresh(int verbose, const struct pathspec *pathspec)
      
       ## t/t1092-sparse-checkout-compatibility.sh ##
      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'status/add: outside sparse cone' '
     - 	test_all_match git commit -m folder1/newer
     - '
       
     --test_expect_failure 'add: pathspec within sparse directory' '
     -+test_expect_success 'add: pathspec within sparse directory' '
     - 	init_repos &&
     - 
     - 	run_on_sparse mkdir folder1 &&
     -@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_failure 'add: pathspec within sparse directory' '
     - 	# This "git add folder1/a" fails with a warning
     - 	# in the sparse repos, differing from the full
     - 	# repo. This is intentional.
     --	#
     --	# However, in the sparse-index, folder1/a does not
     --	# match any cache entry and fails with a different
     --	# error message. This needs work.
     + 	# Adding the path outside of the sparse-checkout cone should fail.
       	test_sparse_match test_must_fail git add folder1/a &&
     - 	test_sparse_match test_must_fail git add --refresh folder1/a &&
     - 	test_all_match git status --porcelain=v2
     +-
     +-	test_must_fail git -C sparse-checkout add --refresh folder1/a 2>sparse-checkout-err &&
     +-	test_must_fail git -C sparse-index add --refresh folder1/a 2>sparse-index-err &&
     +-	# NEEDSWORK: A sparse index changes the error message.
     +-	! test_cmp sparse-checkout-err sparse-index-err &&
     ++	test_sparse_match test_must_fail git add --refresh folder1/a &&
     + 
     + 	# NEEDSWORK: Adding a newly-tracked file outside the cone succeeds
     + 	test_sparse_match git add folder1/new &&
 -:  ----------- > 5:  1543550a4e8 add: remove ensure_full_index() with --renormalize

Comments

Elijah Newren July 28, 2021, 11:13 p.m. UTC | #1
On Mon, Jul 26, 2021 at 9:18 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This patch series re-submits the 'git add' integration with sparse-index.
> The performance gains are the same as before.
>
> It is based on ds/commit-and-checkout-with-sparse-index.
>
> This series was delayed from its initial submission for a couple reasons.
>
> The first was because it was colliding with some changes in
> mt/add-rm-in-sparse-checkout, so now we are far enough along that that
> branch is in our history and we can work forwards.
>
> The other concern was about how 'git add ' should respond when a path
> outside of the sparse-checkout cone exists. One recommendation (that I am
> failing to find a link to the message, sorry) was to disallow adding files
> that would become index entries with SKIP_WORKTREE on. However, as I worked
> towards that goal I found that change would cause problems for a realistic
> scenario: merge conflicts outside of the sparse-checkout cone.
>
> Update: Elijah points out that the SKIP_WORKTREE bit is removed from
> conflict files, which allows adding the conflicted files without warning.
> (However, we also need to be careful about untracked files, as documented in
> the test added here.)
>
> The first patch of this series adds tests that create merge conflicts
> outside of the sparse cone and then presents different ways a user could
> resolve the situation. We want all of them to be feasible, and this
> includes:
>
>  1. Reverting the file to a known version in history.
>  2. Adding the file with its contents on disk.
>  3. Moving the file to a new location in the sparse directory.
>
> The one place I did continue to update is 'git add --refresh ' to match the
> behavior added by mt/add-rm-in-sparse-checkout which outputs an error
> message. This happens even when the file exists in the working directory,
> but that seems appropriate enough.
>
>
> Updates in V2
> =============
>
>  * Test comments in patch 1 are improved.
>
>  * The test hunk that was removed in patch 2 and reintroduced in the old
>    patch 4 is modified to clarify how the behavior changes with that patch.
>    Then, the test is modified by future patches.
>
>  * Another instance of ensure_full_index() is removed from the --renormalize
>    option. This option already ignored files with the SKIP_WORKTREE bit, so
>    this should be an obviously-correct removal.
>
>  * a full proposal for what to do with "git (add|mv|rm)" and paths outside
>    the cone is delayed to another series (with an RFC round) because the
>    behavior of the sparse-index matches a full index with sparse-checkout.

I think this makes sense.

I've read through the patches, and I like this version...with one
exception.  Can we mark the test added in patch 1 under

     # 3. Rename the file to another sparse filename and
     #    accept conflict markers as resolved content.

as NEEDSWORK or even MAYNEEDWORK?  I'm still quite unconvinced that it
is testing for correct behavior, and don't want to paint ourselves
into a corner.  In particular, we don't allow folks to "git add
$IGNORED_FILE" without a --force override because it's likely to be a
mistake.  I think the same logic holds for adding untracked files
outside the sparsity cone.  But it's actually even worse than that
case because there's a secondary level of surprise too: adding files
outside the sparsity cone will result in delayed user surprises when
the next git command that happens to call unpack_trees() (which are
found all over the codebase) removes the file from the working tree.
I've had some such reports already.

If that test is marked as NEEDSWORK or even as the correct behavior
still being under dispute, then you can happily apply my:

Reviewed-by: Elijah Newren <newren@gmail.com>

> Thanks, -Stolee
>
> Derrick Stolee (5):
>   t1092: test merge conflicts outside cone
>   add: allow operating on a sparse-only index
>   pathspec: stop calling ensure_full_index
>   add: ignore outside the sparse-checkout in refresh()
>   add: remove ensure_full_index() with --renormalize
>
>  builtin/add.c                            | 15 ++++--
>  pathspec.c                               |  2 -
>  t/t1092-sparse-checkout-compatibility.sh | 62 ++++++++++++++++++++----
>  3 files changed, 65 insertions(+), 14 deletions(-)
>
>
> base-commit: 71e301501c88399711a1bf8515d1747e92cfbb9b
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-999%2Fderrickstolee%2Fsparse-index%2Fadd-v2
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-999/derrickstolee/sparse-index/add-v2
> Pull-Request: https://github.com/gitgitgadget/git/pull/999
>
> Range-diff vs v1:
>
>  1:  a763a7d15b8 ! 1:  8f2fd9370fe t1092: test merge conflicts outside cone
>      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'merge' '
>       + test_all_match test_must_fail git merge -m merge merge-right &&
>       + test_all_match git status --porcelain=v2 &&
>       +
>      -+ # resolve the conflict in different ways:
>      -+ # 1. revert to the base
>      ++ # Resolve the conflict in different ways:
>      ++ # 1. Revert to the base
>       + test_all_match git checkout base -- deep/deeper2/a &&
>       + test_all_match git status --porcelain=v2 &&
>       +
>      -+ # 2. add the file with conflict markers
>      ++ # 2. Add the file with conflict markers
>       + test_all_match git add folder1/a &&
>       + test_all_match git status --porcelain=v2 &&
>       +
>      -+ # 3. rename the file to another sparse filename
>      ++ # 3. Rename the file to another sparse filename and
>      ++ #    accept conflict markers as resolved content.
>       + run_on_all mv folder2/a folder2/z &&
>       + test_all_match git add folder2 &&
>       + test_all_match git status --porcelain=v2 &&
>  2:  791c6c2c9ad ! 2:  6e43f118fa0 add: allow operating on a sparse-only index
>      @@ Commit message
>           sparse-index. Comparing to the full index case, 'git add -A' goes from
>           0.37s to 0.05s, which is "only" an 86% improvement.
>
>      +    This modification to 'git add' creates some behavior change depending on
>      +    the use of a sparse index. We modify a test in t1092 to demonstrate
>      +    these changes which will be remedied in future changes.
>      +
>           Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>
>        ## builtin/add.c ##
>      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'status/add: outsi
>       - # This "git add folder1/a" fails with a warning
>       - # in the sparse repos, differing from the full
>       - # repo. This is intentional.
>      -- test_sparse_match test_must_fail git add folder1/a &&
>      ++ # Adding the path outside of the sparse-checkout cone should fail.
>      +  test_sparse_match test_must_fail git add folder1/a &&
>       - test_sparse_match test_must_fail git add --refresh folder1/a &&
>       - test_all_match git status --porcelain=v2 &&
>      --
>      ++
>      ++ test_must_fail git -C sparse-checkout add --refresh folder1/a 2>sparse-checkout-err &&
>      ++ test_must_fail git -C sparse-index add --refresh folder1/a 2>sparse-index-err &&
>      ++ # NEEDSWORK: A sparse index changes the error message.
>      ++ ! test_cmp sparse-checkout-err sparse-index-err &&
>      ++
>      ++ # NEEDSWORK: Adding a newly-tracked file outside the cone succeeds
>      ++ test_sparse_match git add folder1/new &&
>      +
>         test_all_match git add . &&
>         test_all_match git status --porcelain=v2 &&
>         test_all_match git commit -m folder1/new &&
>      ++ test_all_match git rev-parse HEAD^{tree} &&
>      +
>      +  run_on_all ../edit-contents folder1/newer &&
>      +  test_all_match git add folder1/ &&
>      +  test_all_match git status --porcelain=v2 &&
>      +- test_all_match git commit -m folder1/newer
>      ++ test_all_match git commit -m folder1/newer &&
>      ++ test_all_match git rev-parse HEAD^{tree}
>      + '
>      +
>      + test_expect_success 'checkout and reset --hard' '
>       @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is not expanded' '
>         git -C sparse-index reset --hard &&
>         ensure_not_expanded checkout rename-out-to-out -- deep/deeper1 &&
>  3:  a577ea4c74d = 3:  2ae91e0af29 pathspec: stop calling ensure_full_index
>  4:  89ec6a7ce67 < -:  ----------- t1092: 'git add --refresh' difference with sparse-index
>  5:  76066a78ce0 ! 4:  a79728d4c64 add: ignore outside the sparse-checkout in refresh()
>      @@ builtin/add.c: static int refresh(int verbose, const struct pathspec *pathspec)
>
>        ## t/t1092-sparse-checkout-compatibility.sh ##
>       @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'status/add: outside sparse cone' '
>      -  test_all_match git commit -m folder1/newer
>      - '
>
>      --test_expect_failure 'add: pathspec within sparse directory' '
>      -+test_expect_success 'add: pathspec within sparse directory' '
>      -  init_repos &&
>      -
>      -  run_on_sparse mkdir folder1 &&
>      -@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_failure 'add: pathspec within sparse directory' '
>      -  # This "git add folder1/a" fails with a warning
>      -  # in the sparse repos, differing from the full
>      -  # repo. This is intentional.
>      -- #
>      -- # However, in the sparse-index, folder1/a does not
>      -- # match any cache entry and fails with a different
>      -- # error message. This needs work.
>      +  # Adding the path outside of the sparse-checkout cone should fail.
>         test_sparse_match test_must_fail git add folder1/a &&
>      -  test_sparse_match test_must_fail git add --refresh folder1/a &&
>      -  test_all_match git status --porcelain=v2
>      +-
>      +- test_must_fail git -C sparse-checkout add --refresh folder1/a 2>sparse-checkout-err &&
>      +- test_must_fail git -C sparse-index add --refresh folder1/a 2>sparse-index-err &&
>      +- # NEEDSWORK: A sparse index changes the error message.
>      +- ! test_cmp sparse-checkout-err sparse-index-err &&
>      ++ test_sparse_match test_must_fail git add --refresh folder1/a &&
>      +
>      +  # NEEDSWORK: Adding a newly-tracked file outside the cone succeeds
>      +  test_sparse_match git add folder1/new &&
>  -:  ----------- > 5:  1543550a4e8 add: remove ensure_full_index() with --renormalize
>
> --
> gitgitgadget
Derrick Stolee July 29, 2021, 2:03 a.m. UTC | #2
On 7/28/2021 7:13 PM, Elijah Newren wrote:
> On Mon, Jul 26, 2021 at 9:18 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
...
>>  * a full proposal for what to do with "git (add|mv|rm)" and paths outside
>>    the cone is delayed to another series (with an RFC round) because the
>>    behavior of the sparse-index matches a full index with sparse-checkout.
> 
> I think this makes sense.
> 
> I've read through the patches, and I like this version...with one
> exception.  Can we mark the test added in patch 1 under
> 
>      # 3. Rename the file to another sparse filename and
>      #    accept conflict markers as resolved content.
> 
> as NEEDSWORK or even MAYNEEDWORK?

I have no objection to adding a blurb such as:

	# NEEDSWORK: allowing adds outside the sparse cone can be
	# confusingto users, as the file can disappear from the
	# worktree without warning in later Git commands.

And perhaps I'm misunderstanding the situation a bit, but that
seems to apply not just to this third case, but all of them. I
don't see why the untracked case is special compared to the
tracked case. More investigation may be required on my part.

>  I'm still quite unconvinced that it
> is testing for correct behavior, and don't want to paint ourselves
> into a corner.  In particular, we don't allow folks to "git add
> $IGNORED_FILE" without a --force override because it's likely to be a
> mistake. 

I agree about ignored files, and that is true whether or not they
are in the sparse cone.

> I think the same logic holds for adding untracked files
> outside the sparsity cone.  But it's actually even worse than that
> case because there's a secondary level of surprise too: adding files
> outside the sparsity cone will result in delayed user surprises when
> the next git command that happens to call unpack_trees() (which are
> found all over the codebase) removes the file from the working tree.
> I've had some such reports already.

I believe this is testing a realistic scenario that users are
hitting in the wild today. I would believe that users succeed with
these commands more often than they are confused by the file
disappearing from the worktree in a later Git command, so having
this sequence of events be documented as a potential use case has
some value.

I simultaneously don't think it is behavior we want to commit to
as a contract for all future Git versions, but there is value in
showing how this situation changes with any future meddling. In
particular: will users be able to self-discover the "new" way of
doing things?

The proposal part of changing how add/mv/rm behave in these cases
would need to adjust this test with something that would also help
direct users to a helpful resolution. For example, the first run
of

	git add sparse/dir/file

could error out with an error message saying "The pathspec is
outside of your sparse cone, so staging the file might lead to
a staged change that is removed from your working directory."
But we should _also_ include two strategies for getting out of
this state:

1. Adjust your sparse-checkout definition so this file is in scope.

-or- (and this is the part that would be new)

2. If you understand the risks of staging a file outside the sparse
   cone, then run 'git add --sparse sparse/dir/file'.

(Insert whatever option would be appropriate for --sparse here.)

Such a warning message would allow users who follow the steps listed
in the test to know how to adjust their usage to then get into a
good state.

> If that test is marked as NEEDSWORK or even as the correct behavior
> still being under dispute, then you can happily apply my:

I would classify this as "The test documents current behavior, but
isn't a contract for future behavior." With a concept such as my
suggestion above, the test could be modified to check for the
warning and then run the second command with the extra option and
complete the test's expectations. Having the existing behavior
documented in a test helps demonstrate how behavior is changing.

We we've discussed, we want to give such a behavior change the
right venue for feedback and suggestions for alternate approaches,
and this series is not the right place for that. Hopefully you
can tell that it is on my mind and that I want to recommend a
change in the near future.

Thanks,
-Stolee
Elijah Newren July 29, 2021, 2:57 a.m. UTC | #3
On Wed, Jul 28, 2021 at 8:03 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 7/28/2021 7:13 PM, Elijah Newren wrote:
> > On Mon, Jul 26, 2021 at 9:18 AM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> ...
> >>  * a full proposal for what to do with "git (add|mv|rm)" and paths outside
> >>    the cone is delayed to another series (with an RFC round) because the
> >>    behavior of the sparse-index matches a full index with sparse-checkout.
> >
> > I think this makes sense.
> >
> > I've read through the patches, and I like this version...with one
> > exception.  Can we mark the test added in patch 1 under
> >
> >      # 3. Rename the file to another sparse filename and
> >      #    accept conflict markers as resolved content.
> >
> > as NEEDSWORK or even MAYNEEDWORK?
>
> I have no objection to adding a blurb such as:
>
>         # NEEDSWORK: allowing adds outside the sparse cone can be
>         # confusingto users, as the file can disappear from the
>         # worktree without warning in later Git commands.
>

Sounds great to me other than the simple typo (s/confusingto/confusing to/)

> And perhaps I'm misunderstanding the situation a bit, but that
> seems to apply not just to this third case, but all of them. I
> don't see why the untracked case is special compared to the
> tracked case. More investigation may be required on my part.

The possible cases for files outside the sparsity patterns are:
  a) untracked
  b) tracked and SKIP_WORKTREE
  c) tracked and !SKIP_WORKTREE (e.g. because merge conflicts)

From the above set, we've been talking about untracked and I think
we're on the same page about those.  Case (b) was already corrected by
Matheus a number of releases back; git-add will throw an error
explaining the situation and prevent the adding.  The error tells the
user to expand their sparsity set to work on those files.  For case
(c), you are right that those are problematic in the same way (they
can disappear later after a git-add)...but we're also in the situation
where the only way to get rid of the conflicting stages is to run git
add.  So, in my mind, case (c) puts us between a rock and a hard
place, and we probably need to allow the git-add.

> >  I'm still quite unconvinced that it
> > is testing for correct behavior, and don't want to paint ourselves
> > into a corner.  In particular, we don't allow folks to "git add
> > $IGNORED_FILE" without a --force override because it's likely to be a
> > mistake.
>
> I agree about ignored files, and that is true whether or not they
> are in the sparse cone.

Yes, and...

> > I think the same logic holds for adding untracked files
> > outside the sparsity cone.

In my opinion, "outside the sparsity cone" is another form of "being
ignored", and in my mind should be treated similarly -- it should
generally require an override to add such files.  (Case (c) possibly
being an exception, though maybe even it shouldn't be.)

> >  But it's actually even worse than that
> > case because there's a secondary level of surprise too: adding files
> > outside the sparsity cone will result in delayed user surprises when
> > the next git command that happens to call unpack_trees() (which are
> > found all over the codebase) removes the file from the working tree.
> > I've had some such reports already.
>
> I believe this is testing a realistic scenario that users are
> hitting in the wild today. I would believe that users succeed with
> these commands more often than they are confused by the file
> disappearing from the worktree in a later Git command, so having
> this sequence of events be documented as a potential use case has
> some value.
>
> I simultaneously don't think it is behavior we want to commit to
> as a contract for all future Git versions, but there is value in
> showing how this situation changes with any future meddling. In
> particular: will users be able to self-discover the "new" way of
> doing things?

Oh, I totally agree that documenting how things work definitely has
value.  I've added several test_expect_failure cases and whatnot to
the testsuite.  But there's a big difference between documenting how
things work and documenting how we expect them to work.  If the two
differ, then any good provided by documenting how things work with a
test marked as test_expect_success may be counterbalanced or even
overwhelmed by the harm it also causes, particularly in areas where
working around backward compatibility constraints are more difficult.

For example, not that long ago, it seemed people agreed (even Junio)
that commit hooks were never intended to be part of rebase (they
aren't part of the apply backend, and were only part of the
merge/interactive backend due to historical accident) and could be
removed (being replaced by just a rebase hook called at the end of the
rebase instead of with every commit).  There were user complaints
about the commit hooks being triggered when the default backend
switched, backing up the expectation.  But no one jumped in to fix it
at the time.  Then when it was brought up again recently, Junio said
we couldn't just remove those because of backward compatibility.
That's forcing me to consider suggesting a bunch of new arguments to
rebase to let users get unbroken when they discover they need it, or
maybe even a new toplevel command because we painted ourselves into a
corner (there are more backward compatibility corners in rebase
too...).

Trying to get out of a corner we paint ourselves into with
sparse-checkout would be massively harder, which is why I keep harping
on this kind of thing.  I'm very concerned it's happening even despite
my numerous comments and worries about it.

> The proposal part of changing how add/mv/rm behave in these cases
> would need to adjust this test with something that would also help
> direct users to a helpful resolution. For example, the first run
> of
>
>         git add sparse/dir/file
>
> could error out with an error message saying "The pathspec is
> outside of your sparse cone, so staging the file might lead to
> a staged change that is removed from your working directory."

Yes, much like we currently do with tracked files which are SKIP_WORKTREE.

> But we should _also_ include two strategies for getting out of
> this state:
>
> 1. Adjust your sparse-checkout definition so this file is in scope.
>
> -or- (and this is the part that would be new)
>
> 2. If you understand the risks of staging a file outside the sparse
>    cone, then run 'git add --sparse sparse/dir/file'.
>
> (Insert whatever option would be appropriate for --sparse here.)
>
> Such a warning message would allow users who follow the steps listed
> in the test to know how to adjust their usage to then get into a
> good state.

Choice 2 doesn't exist yet, but yeah your suggestion makes sense.

> > If that test is marked as NEEDSWORK or even as the correct behavior
> > still being under dispute, then you can happily apply my:
>
> I would classify this as "The test documents current behavior, but
> isn't a contract for future behavior." With a concept such as my
> suggestion above, the test could be modified to check for the
> warning and then run the second command with the extra option and
> complete the test's expectations. Having the existing behavior
> documented in a test helps demonstrate how behavior is changing.
>
> We we've discussed, we want to give such a behavior change the
> right venue for feedback and suggestions for alternate approaches,
> and this series is not the right place for that. Hopefully you
> can tell that it is on my mind and that I want to recommend a
> change in the near future.

I'm totally fine with such changes not being part of this series.  I
just don't want a test_expect_success that checks for behavior that I
consider buggy unless it comes with a disclaimer that it's checking
for existing rather than expected behavior.
Derrick Stolee July 29, 2021, 2:49 p.m. UTC | #4
On 7/28/2021 10:57 PM, Elijah Newren wrote:
> On Wed, Jul 28, 2021 at 8:03 PM Derrick Stolee <stolee@gmail.com> wrote:
>>
>> On 7/28/2021 7:13 PM, Elijah Newren wrote:
>>> On Mon, Jul 26, 2021 at 9:18 AM Derrick Stolee via GitGitGadget
>>> <gitgitgadget@gmail.com> wrote:
>> ...
>>>>  * a full proposal for what to do with "git (add|mv|rm)" and paths outside
>>>>    the cone is delayed to another series (with an RFC round) because the
>>>>    behavior of the sparse-index matches a full index with sparse-checkout.
>>>
>>> I think this makes sense.
>>>
>>> I've read through the patches, and I like this version...with one
>>> exception.  Can we mark the test added in patch 1 under
>>>
>>>      # 3. Rename the file to another sparse filename and
>>>      #    accept conflict markers as resolved content.
>>>
>>> as NEEDSWORK or even MAYNEEDWORK?
>>
>> I have no objection to adding a blurb such as:
>>
>>         # NEEDSWORK: allowing adds outside the sparse cone can be
>>         # confusingto users, as the file can disappear from the
>>         # worktree without warning in later Git commands.
>>
> 
> Sounds great to me other than the simple typo (s/confusingto/confusing to/)
> 
>> And perhaps I'm misunderstanding the situation a bit, but that
>> seems to apply not just to this third case, but all of them. I
>> don't see why the untracked case is special compared to the
>> tracked case. More investigation may be required on my part.
> 
> The possible cases for files outside the sparsity patterns are:
>   a) untracked
>   b) tracked and SKIP_WORKTREE
>   c) tracked and !SKIP_WORKTREE (e.g. because merge conflicts)
> 
> From the above set, we've been talking about untracked and I think
> we're on the same page about those.  Case (b) was already corrected by
> Matheus a number of releases back; git-add will throw an error
> explaining the situation and prevent the adding.  The error tells the
> user to expand their sparsity set to work on those files.  For case
> (c), you are right that those are problematic in the same way (they
> can disappear later after a git-add)...but we're also in the situation
> where the only way to get rid of the conflicting stages is to run git
> add.  So, in my mind, case (c) puts us between a rock and a hard
> place, and we probably need to allow the git-add.

I appreciate this additional context. Thanks.
 
>>>  I'm still quite unconvinced that it
>>> is testing for correct behavior, and don't want to paint ourselves
>>> into a corner.  In particular, we don't allow folks to "git add
>>> $IGNORED_FILE" without a --force override because it's likely to be a
>>> mistake.
>>
>> I agree about ignored files, and that is true whether or not they
>> are in the sparse cone.
> 
> Yes, and...
> 
>>> I think the same logic holds for adding untracked files
>>> outside the sparsity cone.
> 
> In my opinion, "outside the sparsity cone" is another form of "being
> ignored", and in my mind should be treated similarly -- it should
> generally require an override to add such files.  (Case (c) possibly
> being an exception, though maybe even it shouldn't be.)

I don't hold that same interpretation. I think of it instead as
"hidden" files, but they still matter. I also think that advising
one to adjust their sparsity patterns might be dangerous because
not all users know the ramifications of doing that. They might
accidentally download an enormous amount of data to correct a
single file.

Having an override seems like the best option, and we can hopefully
make it consistent across all the cases and commands.

...

> Trying to get out of a corner we paint ourselves into with
> sparse-checkout would be massively harder, which is why I keep harping
> on this kind of thing.  I'm very concerned it's happening even despite
> my numerous comments and worries about it.
...
> I'm totally fine with such changes not being part of this series.  I
> just don't want a test_expect_success that checks for behavior that I
> consider buggy unless it comes with a disclaimer that it's checking
> for existing rather than expected behavior.

I understand your perspective. I'll send a v3 soon that adds a
comment on top of the entire test signalling the things we talked
about here: this is a documentation of behavior, not an endorsement,
and we should probably change it because users can get confused.

Thanks,
-Stolee
Elijah Newren July 30, 2021, 12:52 p.m. UTC | #5
On Thu, Jul 29, 2021 at 8:49 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 7/28/2021 10:57 PM, Elijah Newren wrote:
> > On Wed, Jul 28, 2021 at 8:03 PM Derrick Stolee <stolee@gmail.com> wrote:
> >>
> >> On 7/28/2021 7:13 PM, Elijah Newren wrote:
> >>> On Mon, Jul 26, 2021 at 9:18 AM Derrick Stolee via GitGitGadget
> >>> <gitgitgadget@gmail.com> wrote:
...
> >> I agree about ignored files, and that is true whether or not they
> >> are in the sparse cone.
> >
> > Yes, and...
> >
> >>> I think the same logic holds for adding untracked files
> >>> outside the sparsity cone.
> >
> > In my opinion, "outside the sparsity cone" is another form of "being
> > ignored", and in my mind should be treated similarly -- it should
> > generally require an override to add such files.  (Case (c) possibly
> > being an exception, though maybe even it shouldn't be.)
>
> I don't hold that same interpretation. I think of it instead as
> "hidden" files, but they still matter. I also think that advising
> one to adjust their sparsity patterns might be dangerous because
> not all users know the ramifications of doing that. They might
> accidentally download an enormous amount of data to correct a
> single file.
>
> Having an override seems like the best option, and we can hopefully
> make it consistent across all the cases and commands.

I think we might be arguing two sides of the same coin at this point.
We don't have a more general term for special in a way that shouldn't
be included by default with git-add, and I couldn't think of a good
synonym, so I used the words "another form of being ignored" (not
trying to imply that it was the same as .gitignored, but just that the
two were special in a very similar way) while you tried to highlight
the differences using "hidden" but agreed they were similar in that
they should have an override.

Fair point on adjusting sparsity patterns and the data download it can cause.

> ...
>
> > Trying to get out of a corner we paint ourselves into with
> > sparse-checkout would be massively harder, which is why I keep harping
> > on this kind of thing.  I'm very concerned it's happening even despite
> > my numerous comments and worries about it.
> ...
> > I'm totally fine with such changes not being part of this series.  I
> > just don't want a test_expect_success that checks for behavior that I
> > consider buggy unless it comes with a disclaimer that it's checking
> > for existing rather than expected behavior.
>
> I understand your perspective. I'll send a v3 soon that adds a
> comment on top of the entire test signalling the things we talked
> about here: this is a documentation of behavior, not an endorsement,
> and we should probably change it because users can get confused.

Thanks for doing that.  :-)