diff mbox series

delta-islands: fix segfault when freeing island marks

Message ID 61e490595b80b34c55fd640e093e021ff6fa9591.1676542973.git.ps@pks.im (mailing list archive)
State Accepted
Commit 6eb095d78790d0d6cbad2186685dad4e7ba2e3de
Headers show
Series delta-islands: fix segfault when freeing island marks | expand

Commit Message

Patrick Steinhardt Feb. 16, 2023, 10:29 a.m. UTC
In 647982bb71 (delta-islands: free island_marks and bitmaps, 2023-02-03)
we have introduced logic to free `island_marks` in order to reduce heap
memory usage in git-pack-objects(1). This commit is causing segfaults in
the case where this Git command does not load delta islands at all, e.g.
when reading object IDs from standard input. One such crash can be hit
when using repacking multi-pack-indices with delta islands enabled.

The root cause of this bug is that we unconditionally dereference the
`island_marks` variable even in the case where it is a `NULL` pointer,
which is fixed by making it conditional. Note that we still leave the
logic in place to set the pointer to `-1` to detect use-after-free bugs
even when there are no loaded island marks at all.
---

An easy way to reproduce the segfault is:

    $ git pack-objects .git/objects/pack/pack --delta-islands </dev/null

I didn't add a test for git-pack-objects(1) directly though, mostly
because I didn't find any location to put it. I'm happy to do so though
in case we want that.

 delta-islands.c             | 12 +++++++-----
 t/t5319-multi-pack-index.sh | 16 ++++++++++++++++
 2 files changed, 23 insertions(+), 5 deletions(-)

Comments

Eric Wong Feb. 16, 2023, 12:08 p.m. UTC | #1
Patrick Steinhardt <ps@pks.im> wrote:
> In 647982bb71 (delta-islands: free island_marks and bitmaps, 2023-02-03)
> we have introduced logic to free `island_marks` in order to reduce heap
> memory usage in git-pack-objects(1). This commit is causing segfaults in
> the case where this Git command does not load delta islands at all, e.g.
> when reading object IDs from standard input. One such crash can be hit
> when using repacking multi-pack-indices with delta islands enabled.
> 
> The root cause of this bug is that we unconditionally dereference the
> `island_marks` variable even in the case where it is a `NULL` pointer,
> which is fixed by making it conditional. Note that we still leave the
> logic in place to set the pointer to `-1` to detect use-after-free bugs
> even when there are no loaded island marks at all.

Oops, my fault :x  Thanks for this fix.

I think eliminating global flags like use_delta_islands and just
allocating `struct delta_islands_foo *' in builtin/pack-objects.c would
make it harder for people unfamiliar with the code to avoid bugs like this.

Tested-by: Eric Wong <e@80x24.org>

> An easy way to reproduce the segfault is:
> 
>     $ git pack-objects .git/objects/pack/pack --delta-islands </dev/null
> 
> I didn't add a test for git-pack-objects(1) directly though, mostly
> because I didn't find any location to put it. I'm happy to do so though
> in case we want that.

*shrug* I'm often unsure where to add tests, too.
Maybe just append it to the end of t/t5300-pack-object.sh ?
Junio C Hamano Feb. 16, 2023, 5:36 p.m. UTC | #2
Patrick Steinhardt <ps@pks.im> writes:

> In 647982bb71 (delta-islands: free island_marks and bitmaps, 2023-02-03)
> we have introduced logic to free `island_marks` in order to reduce heap
> memory usage in git-pack-objects(1). This commit is causing segfaults in
> the case where this Git command does not load delta islands at all, e.g.
> when reading object IDs from standard input. One such crash can be hit
> when using repacking multi-pack-indices with delta islands enabled.
>
> The root cause of this bug is that we unconditionally dereference the
> `island_marks` variable even in the case where it is a `NULL` pointer,
> which is fixed by making it conditional. Note that we still leave the
> logic in place to set the pointer to `-1` to detect use-after-free bugs
> even when there are no loaded island marks at all.
> ---

Missing sign-off?
Junio C Hamano Feb. 16, 2023, 6:01 p.m. UTC | #3
Patrick Steinhardt <ps@pks.im> writes:

> In 647982bb71 (delta-islands: free island_marks and bitmaps, 2023-02-03)
> we have introduced logic to free `island_marks` in order to reduce heap
> memory usage in git-pack-objects(1). This commit is causing segfaults in
> the case where this Git command does not load delta islands at all, e.g.
> when reading object IDs from standard input. One such crash can be hit
> when using repacking multi-pack-indices with delta islands enabled.
>
> The root cause of this bug is that we unconditionally dereference the
> `island_marks` variable even in the case where it is a `NULL` pointer,
> which is fixed by making it conditional. Note that we still leave the
> logic in place to set the pointer to `-1` to detect use-after-free bugs
> even when there are no loaded island marks at all.

There still are unprotected uses of island_marks in delta-islands.c
after this patch, but I think they are safe.

 * The callchain deduplicate_islands() -> mark_remote_island_1() ->
   create_or_get_island_marks() assume island_marks is not NULL, and
   the only caller of deduplicate_islands(), load_delta_islands(),
   initializes island_marks() before calling into it.

 * set_island_marks() assumes island_marks is not NULL.  One of its
   two callers, resolve_tree_islands() ensures island_marks is not
   NULL before proceeding.  The other one, propagate_island_marks(),
   also assumes island_marks is not NULL, and is called only from
   pack-objects.c::show_commit() when use_delta_islands is set.  It
   is not apparent if island_marks has already populated when this
   happens, though.

I think early in the pack-objects process, prepare_pack() calls
resolve_tree_islands() but as we have seen, it just punts when
island_marks is NULL, and not populates.  But get_object_list()
explicitly calls load_delta_islands() when use_delta_islands is
set, and that happens way before prepare_pack() gets called, so we
should be safe.

Thanks.
Patrick Steinhardt Feb. 20, 2023, 7:34 a.m. UTC | #4
On Thu, Feb 16, 2023 at 10:01:31AM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > In 647982bb71 (delta-islands: free island_marks and bitmaps, 2023-02-03)
> > we have introduced logic to free `island_marks` in order to reduce heap
> > memory usage in git-pack-objects(1). This commit is causing segfaults in
> > the case where this Git command does not load delta islands at all, e.g.
> > when reading object IDs from standard input. One such crash can be hit
> > when using repacking multi-pack-indices with delta islands enabled.
> >
> > The root cause of this bug is that we unconditionally dereference the
> > `island_marks` variable even in the case where it is a `NULL` pointer,
> > which is fixed by making it conditional. Note that we still leave the
> > logic in place to set the pointer to `-1` to detect use-after-free bugs
> > even when there are no loaded island marks at all.
> 
> There still are unprotected uses of island_marks in delta-islands.c
> after this patch, but I think they are safe.
> 
>  * The callchain deduplicate_islands() -> mark_remote_island_1() ->
>    create_or_get_island_marks() assume island_marks is not NULL, and
>    the only caller of deduplicate_islands(), load_delta_islands(),
>    initializes island_marks() before calling into it.
> 
>  * set_island_marks() assumes island_marks is not NULL.  One of its
>    two callers, resolve_tree_islands() ensures island_marks is not
>    NULL before proceeding.  The other one, propagate_island_marks(),
>    also assumes island_marks is not NULL, and is called only from
>    pack-objects.c::show_commit() when use_delta_islands is set.  It
>    is not apparent if island_marks has already populated when this
>    happens, though.
> 
> I think early in the pack-objects process, prepare_pack() calls
> resolve_tree_islands() but as we have seen, it just punts when
> island_marks is NULL, and not populates.  But get_object_list()
> explicitly calls load_delta_islands() when use_delta_islands is
> set, and that happens way before prepare_pack() gets called, so we
> should be safe.
> 
> Thanks.

Agreed. I also had a look at the other usages of `island_marks` and
couldn't find any obvious way to access it without it being initialized.

Patrick
Patrick Steinhardt Feb. 20, 2023, 7:38 a.m. UTC | #5
On Thu, Feb 16, 2023 at 09:36:20AM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > In 647982bb71 (delta-islands: free island_marks and bitmaps, 2023-02-03)
> > we have introduced logic to free `island_marks` in order to reduce heap
> > memory usage in git-pack-objects(1). This commit is causing segfaults in
> > the case where this Git command does not load delta islands at all, e.g.
> > when reading object IDs from standard input. One such crash can be hit
> > when using repacking multi-pack-indices with delta islands enabled.
> >
> > The root cause of this bug is that we unconditionally dereference the
> > `island_marks` variable even in the case where it is a `NULL` pointer,
> > which is fixed by making it conditional. Note that we still leave the
> > logic in place to set the pointer to `-1` to detect use-after-free bugs
> > even when there are no loaded island marks at all.
> > ---
> 
> Missing sign-off?

Oops, right. Please feel free to add it when pulling:

    Signed-off-by: Patrick Steinhardt <ps@pks.im>

If you prefer I'd also be happy to send another iteration of this patch,
potentially including a direct test for git-pack-objects(1). I'm still
not sure whether it'd be worth it though given that it's implicitly
covered by the higher-level test that checks for the segfault by using
git-multi-pack-index(1).

Patrick
Junio C Hamano Feb. 21, 2023, 5:15 p.m. UTC | #6
Patrick Steinhardt <ps@pks.im> writes:

> Oops, right. Please feel free to add it when pulling:
>
>     Signed-off-by: Patrick Steinhardt <ps@pks.im>

I'll add that to the copy I already have.

> If you prefer I'd also be happy to send another iteration of this patch,
> potentially including a direct test for git-pack-objects(1). I'm still
> not sure whether it'd be worth it though given that it's implicitly
> covered by the higher-level test that checks for the segfault by using
> git-multi-pack-index(1).

The existing test probably is sufficient.

Thanks.
diff mbox series

Patch

diff --git a/delta-islands.c b/delta-islands.c
index 8b234cb85b..afdec0a878 100644
--- a/delta-islands.c
+++ b/delta-islands.c
@@ -517,11 +517,13 @@  void free_island_marks(void)
 {
 	struct island_bitmap *bitmap;
 
-	kh_foreach_value(island_marks, bitmap, {
-		if (!--bitmap->refcount)
-			free(bitmap);
-	});
-	kh_destroy_oid_map(island_marks);
+	if (island_marks) {
+		kh_foreach_value(island_marks, bitmap, {
+			if (!--bitmap->refcount)
+				free(bitmap);
+		});
+		kh_destroy_oid_map(island_marks);
+	}
 
 	/* detect use-after-free with a an address which is never valid: */
 	island_marks = (void *)-1;
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index b5f9b10922..499d5d4c78 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -1015,4 +1015,20 @@  test_expect_success 'complains when run outside of a repository' '
 	grep "not a git repository" err
 '
 
+test_expect_success 'repack with delta islands' '
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		test_commit first &&
+		git repack &&
+		test_commit second &&
+		git repack &&
+
+		git multi-pack-index write &&
+		git -c repack.useDeltaIslands=true multi-pack-index repack
+	)
+'
+
 test_done