diff mbox series

[2/3] cache-tree: detect mismatching number of index entries

Message ID 595693a6420b2571aabd51ed989bedfa0cfa62e2.1726556195.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series cache-tree: fix segfaults with invalid cache-trees | expand

Commit Message

Patrick Steinhardt Sept. 17, 2024, 7:13 a.m. UTC
In t4058 we have some tests that exercise git-read-tree(1) when used
with a tree that contains duplicate entries. While the expectation is
that we fail, we ideally should fail gracefully without a segfault.

But that is not the case: we never check that the number of entries in
the cache-tree is less than or equal to the number of entries in the
index. This can lead to an out-of-bounds read as we unconditionally
access `istate->cache[idx]`, where `idx` is controlled by the number of
cache-tree entries and the current position therein. The result is a
segfault.

Fix this segfault by adding a sanity check for the number of index
entries before dereferencing them.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 cache-tree.c               |  5 +++++
 t/t4058-diff-duplicates.sh | 12 ++++++------
 2 files changed, 11 insertions(+), 6 deletions(-)

Comments

Junio C Hamano Sept. 19, 2024, 1:35 a.m. UTC | #1
Patrick Steinhardt <ps@pks.im> writes:

> +	if (it->entry_count + pos > istate->cache_nr) {
> +		ret = error(_("corrupted cache-tree has entries not present in index"));
> +		goto out;
> +	}

Is it a safe assumption that the if() condition always indicates an
error?  When sparse-index is in effect, istate->cache_nr may be a
number that is smaller than the true number of paths in the index
(because all paths under a subdirectory we are not interested in are
folded into a single tree-ish entry), and I am not sure how it
should interact with it->entry_count (i.e. the number of paths under
the current directory we are looking at, which obviously cannot be a
sparsified entry) and pos (i.e. the index into active_cache[] that
represend the first path under the current directory)?

I guess as long as "it" is not folded, it does not matter how other
paths from different directories in active_cache[] are sparsified or
expanded, as long as "pos" keeps track of the current position
correctly.

> diff --git a/t/t4058-diff-duplicates.sh b/t/t4058-diff-duplicates.sh
> index 2501c89c1c9..3f602adb055 100755
> --- a/t/t4058-diff-duplicates.sh
> +++ b/t/t4058-diff-duplicates.sh
> @@ -132,15 +132,15 @@ test_expect_success 'create a few commits' '
>  	rm commit_id up final
>  '
>  
> -test_expect_failure 'git read-tree does not segfault' '
> -	test_when_finished rm .git/index.lock &&
> -	test_might_fail git read-tree --reset base
> +test_expect_success 'git read-tree does not segfault' '
> +	test_must_fail git read-tree --reset base 2>err &&
> +	test_grep "error: corrupted cache-tree has entries not present in index" err
>  '

Very good.  test_might_fail is a sign of trouble, and this gives us
a lot more predictability.
Patrick Steinhardt Sept. 24, 2024, 6:48 a.m. UTC | #2
On Wed, Sep 18, 2024 at 06:35:35PM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > +	if (it->entry_count + pos > istate->cache_nr) {
> > +		ret = error(_("corrupted cache-tree has entries not present in index"));
> > +		goto out;
> > +	}
> 
> Is it a safe assumption that the if() condition always indicates an
> error?  When sparse-index is in effect, istate->cache_nr may be a
> number that is smaller than the true number of paths in the index
> (because all paths under a subdirectory we are not interested in are
> folded into a single tree-ish entry), and I am not sure how it
> should interact with it->entry_count (i.e. the number of paths under
> the current directory we are looking at, which obviously cannot be a
> sparsified entry) and pos (i.e. the index into active_cache[] that
> represend the first path under the current directory)?
> 
> I guess as long as "it" is not folded, it does not matter how other
> paths from different directories in active_cache[] are sparsified or
> expanded, as long as "pos" keeps track of the current position
> correctly.

It seems like we end up calling `ensure_full_index()` for a sparse
index, which does cause us to signal to the caller that they should
restart verification. So for all I understand, this function shouldn't
act on a sparsely-populated index.

But I cannot see how it could lead to anything sensible when the added
condition is violated because the first thing we do in the loop is this:

	struct cache_entry *ce = istate->cache[pos + i];

And before we do anything else, we dereference that pointer. So if the
condition doesn't hold we _will_ get an out-of-bounds read of the cache
array and act on the garbage data. And that causes the observed segfault
on my machine and in the test.

So I think that ensuring this property is always the right thing to do.
But I wouldn't be surprised if overall this code could require more love
to make it behave sanely in all scenarios. It certainly feels somewhat
fragile to me.

Patrick
Junio C Hamano Sept. 24, 2024, 5:01 p.m. UTC | #3
Patrick Steinhardt <ps@pks.im> writes:

>> I guess as long as "it" is not folded, it does not matter how other
>> paths from different directories in active_cache[] are sparsified or
>> expanded, as long as "pos" keeps track of the current position
>> correctly.
>
> It seems like we end up calling `ensure_full_index()` for a sparse
> index, which does cause us to signal to the caller that they should
> restart verification. So for all I understand, this function shouldn't
> act on a sparsely-populated index.

OK.  That sounds sensible and safe.
diff mbox series

Patch

diff --git a/cache-tree.c b/cache-tree.c
index 4228b6fad48..1e625673086 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -933,6 +933,11 @@  static int verify_one(struct repository *r,
 		pos = 0;
 	}
 
+	if (it->entry_count + pos > istate->cache_nr) {
+		ret = error(_("corrupted cache-tree has entries not present in index"));
+		goto out;
+	}
+
 	i = 0;
 	while (i < it->entry_count) {
 		struct cache_entry *ce = istate->cache[pos + i];
diff --git a/t/t4058-diff-duplicates.sh b/t/t4058-diff-duplicates.sh
index 2501c89c1c9..3f602adb055 100755
--- a/t/t4058-diff-duplicates.sh
+++ b/t/t4058-diff-duplicates.sh
@@ -132,15 +132,15 @@  test_expect_success 'create a few commits' '
 	rm commit_id up final
 '
 
-test_expect_failure 'git read-tree does not segfault' '
-	test_when_finished rm .git/index.lock &&
-	test_might_fail git read-tree --reset base
+test_expect_success 'git read-tree does not segfault' '
+	test_must_fail git read-tree --reset base 2>err &&
+	test_grep "error: corrupted cache-tree has entries not present in index" err
 '
 
-test_expect_failure 'reset --hard does not segfault' '
-	test_when_finished rm .git/index.lock &&
+test_expect_success 'reset --hard does not segfault' '
 	git checkout base &&
-	test_might_fail git reset --hard
+	test_must_fail git reset --hard 2>err &&
+	test_grep "error: corrupted cache-tree has entries not present in index" err
 '
 
 test_expect_failure 'git diff HEAD does not segfault' '