restore: invalidate cache-tree when removing entries with --staged
diff mbox series

Message ID 20200108114344.GA3380580@coredump.intra.peff.net
State New
Headers show
Series
  • restore: invalidate cache-tree when removing entries with --staged
Related show

Commit Message

Jeff King Jan. 8, 2020, 11:43 a.m. UTC
On Wed, Jan 08, 2020 at 05:40:08AM -0500, Jeff King wrote:

> So there seem to be at least two bugs:
> 
>  - git-restore doesn't properly invalidate the cache-tree
> 
>  - the index-reading code is not careful enough about bogus cache-trees,
>    and may segfault

Here's a fix for the first one. I'm adding Junio to the cc as an expert
in index and cache-tree issues. I'm pretty sure this is the correct fix,
but I have some lingering questions below.

I'm not planning on working on the second one immediately. Between this
and Emily's patch from yesterday, I have a feeling that the index code
could use an audit to be a bit more careful about handling bogus on-disk
data.

-- >8 --
Subject: restore: invalidate cache-tree when removing entries with --staged

When "git restore --staged <path>" removes a path that's in the index,
it marks the entry with CE_REMOVE, but we don't do anything to
invalidate the cache-tree. In the non-staged case, we end up in
checkout_worktree(), which calls remove_marked_cache_entries(). That
actually drops the entries from the index, as well as invalidating the
cache-tree and untracked-cache.

But with --staged, we never call checkout_worktree(), and the CE_REMOVE
entries remain. Interestingly, they are dropped when we write out the
index, but that means the resulting index is inconsistent: its
cache-tree will not match the actual entries, and running "git commit"
immediately after will create the wrong tree.

We can solve this by calling remove_marked_cache_entries() ourselves
before writing out the index. Note that we can't just hoist it out of
checkout_worktree(); that function needs to iterate over the CE_REMOVE
entries (to drop their matching worktree files) before removing them.

One curiosity about the test: without this patch, it actually triggers a
BUG() when running git-restore:

  BUG: cache-tree.c:810: new1 with flags 0x4420000 should not be in cache-tree

But in the original problem report, which used a similar recipe,
git-restore actually creates the bogus index (and the commit is created
with the wrong tree). I'm not sure why the test here behaves differently
than my out-of-suite reproduction, but what's here should catch either
symptom (and the fix corrects both cases).

Reported-by: Torsten Krah <krah.tm@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
So I'm mildly puzzled about the BUG thing above. But also: it seems
really weird that do_write_index() will drop CE_REMOVE entries as it
writes, but not invalidate the cache-tree (or the untracked-cache, which
AFAICT is probably similarly affected). Should it be doing that
invalidation? Or should it be a BUG() to write out an index without
having done remove_marked_cache_entries()?

 builtin/checkout.c |  2 ++
 t/t2070-restore.sh | 18 ++++++++++++++++++
 2 files changed, 20 insertions(+)

Comments

Junio C Hamano Jan. 8, 2020, 3:41 p.m. UTC | #1
Jeff King <peff@peff.net> writes:

> diff --git a/builtin/checkout.c b/builtin/checkout.c
> index b52c490c8f..18ef5fb975 100644
> --- a/builtin/checkout.c
> +++ b/builtin/checkout.c
> @@ -524,6 +524,8 @@ static int checkout_paths(const struct checkout_opts *opts,
>  	/* Now we are committed to check them out */
>  	if (opts->checkout_worktree)
>  		errs |= checkout_worktree(opts);
> +	else
> +		remove_marked_cache_entries(&the_index, 1);

Ah, I was wondering why we were seeing breakages on cache-tree,
which is fairly old and stable part of the system---even though it
had caused us quite a lot of pain while it was growing---all of a
sudden.  No wonder---this codepath is a fairly new one, introduced
when "restore" was added X-<.

And the fix looks right, of course.  Thanks for extracting a
reproducible recipe out of the original reporter and quickly
diagnosing it.  Very much appreciated.
Dennis Kaarsemaker Feb. 5, 2020, 8:24 p.m. UTC | #2
On Wed, 2020-01-08 at 06:43 -0500, Jeff King wrote:
> On Wed, Jan 08, 2020 at 05:40:08AM -0500, Jeff King wrote:
> 
> > So there seem to be at least two bugs:
> > 
> >  - git-restore doesn't properly invalidate the cache-tree
> > 
> >  - the index-reading code is not careful enough about bogus cache-trees,
> >    and may segfault
> 
> Here's a fix for the first one. I'm adding Junio to the cc as an expert
> in index and cache-tree issues. I'm pretty sure this is the correct fix,
> but I have some lingering questions below.
> 
> I'm not planning on working on the second one immediately. Between this
> and Emily's patch from yesterday, I have a feeling that the index code
> could use an audit to be a bit more careful about handling bogus on-disk
> data.

We just ran into something similar where git would create really bogus
commits when mixing squash merges and restore. As it's a private repo,
I don't have an exact recipe for reproducing it, but it roughly goes
like:

git checkout master
git merge --squash branch-for-which-we-want-to-redo-commits
git restore --staged .
git add file1 file2 file3
git commit -m "commit"

This commit would remove a file4 that wasn't even touched in the branch
(further commits would do even more broken things, eventually leading
to broken commit objects with duplicate file contents). Changing `git
restore --staged .` to `git reset HEAD` made this behaviour go away. A
quick search in the list brought this patch and I'm happy to say it
fixes our issue as well.

Thanks Peff!

D.

Patch
diff mbox series

diff --git a/builtin/checkout.c b/builtin/checkout.c
index b52c490c8f..18ef5fb975 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -524,6 +524,8 @@  static int checkout_paths(const struct checkout_opts *opts,
 	/* Now we are committed to check them out */
 	if (opts->checkout_worktree)
 		errs |= checkout_worktree(opts);
+	else
+		remove_marked_cache_entries(&the_index, 1);
 
 	/*
 	 * Allow updating the index when checking out from the index.
diff --git a/t/t2070-restore.sh b/t/t2070-restore.sh
index 21c3f84459..06a5976143 100755
--- a/t/t2070-restore.sh
+++ b/t/t2070-restore.sh
@@ -96,6 +96,7 @@  test_expect_success 'restore --ignore-unmerged ignores unmerged entries' '
 '
 
 test_expect_success 'restore --staged adds deleted intent-to-add file back to index' '
+	test_when_finished git reset --hard &&
 	echo "nonempty" >nonempty &&
 	>empty &&
 	git add nonempty empty &&
@@ -106,4 +107,21 @@  test_expect_success 'restore --staged adds deleted intent-to-add file back to in
 	git diff --cached --exit-code
 '
 
+test_expect_success 'restore --staged invalidates cache tree for deletions' '
+	test_when_finished git reset --hard &&
+	>new1 &&
+	>new2 &&
+	git add new1 new2 &&
+
+	# It is important to commit and then reset here, so that the index
+	# contains a valid cache-tree for the "both" tree.
+	git commit -m both &&
+	git reset --soft HEAD^ &&
+
+	git restore --staged new1 &&
+	git commit -m "just new2" &&
+	git rev-parse HEAD:new2 &&
+	test_must_fail git rev-parse HEAD:new1
+'
+
 test_done