diff mbox series

[2/2] pack-objects: fix segfault in --stdin-packs option

Message ID patch-2.2-a9702132385-20210621T145819Z-avarab@gmail.com (mailing list archive)
State Superseded
Headers show
Series pack-objects: missing tests & --stdin-packs segfault fix | expand

Commit Message

Ævar Arnfjörð Bjarmason June 21, 2021, 3:03 p.m. UTC
Fix a segfault in the --stdin-packs option added in
339bce27f4f (builtin/pack-objects.c: add '--stdin-packs' option,
2021-02-22). The read_packs_list_from_stdin() function didn't check
that the lines it was reading were valid packs, and thus when doing
the QSORT() with pack_mtime_cmp() we'd have a NULL "util" field.

The logic error was in assuming that we could iterate all packs and
annotate the excluded and included packs we got, as opposed to
checking the lines we got on stdin. There was a check for excluded
packs, but included packs were simply assumed to be valid.

As noted in the test we'll not report the first bad line, but whatever
line sorted first according to the string-list.c API. In this case I
think that's fine.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/pack-objects.c | 10 ++++++++++
 t/t5300-pack-object.sh | 18 ++++++++++++++++++
 2 files changed, 28 insertions(+)

Comments

Taylor Blau June 21, 2021, 8:33 p.m. UTC | #1
On Mon, Jun 21, 2021 at 05:03:38PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Fix a segfault in the --stdin-packs option added in
> 339bce27f4f (builtin/pack-objects.c: add '--stdin-packs' option,
> 2021-02-22). The read_packs_list_from_stdin() function didn't check
> that the lines it was reading were valid packs, and thus when doing
> the QSORT() with pack_mtime_cmp() we'd have a NULL "util" field.

It may be worth mentioning that the util pointer is used to associate
the names of included/excluded packs with the packed_git structs they
correspond to. I see it's mentioned in the very next paragraph, but it
may be helpful for other readers to see this information earlier.

> The logic error was in assuming that we could iterate all packs and
> annotate the excluded and included packs we got, as opposed to
> checking the lines we got on stdin. There was a check for excluded
> packs, but included packs were simply assumed to be valid.
>
> As noted in the test we'll not report the first bad line, but whatever
> line sorted first according to the string-list.c API. In this case I
> think that's fine.

Yeah. There isn't really a better way to do that since we don't have a
convenient function to look up packs by their name. Much more convenient
is to loop through all packs and assign them to entries in the
string_list one by one. That's O(n*log(n)), but it doesn't really matter
here since we expect n to be small-ish, and this is by far not the most
expensive part of writing a pack.

You could imagine doing something O(n^2) by looping through all packs
each time you receive a line of input. That performs worse, but arguably
provides a better experience when using this mode interactively. But
that is probably a relatively rare occurrence, so it likely doesn't
matter.

Equally, you could build a mapping from pack name to packed_git struct
ahead of time, and then do the lookups in constant time. That's linear,
of course, but you pay for it in memory. Honestly, the memory cost is
probably quite reasonable, but it may not be worth the effort, since I
suspect the vast majority of usage here is from 'git repack
--geometric'.


> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  builtin/pack-objects.c | 10 ++++++++++
>  t/t5300-pack-object.sh | 18 ++++++++++++++++++
>  2 files changed, 28 insertions(+)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index de00adbb9e0..65579e09fe0 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3310,6 +3310,16 @@ static void read_packs_list_from_stdin(void)
>  			item->util = p;
>  	}
>
> +	/*
> +	 * Arguments we got on stdin may not even be packs. Check that
> +	 * to avoid segfaulting later on in e.g. pack_mtime_cmp().
> +	 */

Could be worth adding "excluded packs are handled below".

> +	for_each_string_list_item(item, &include_packs) {
> +		struct packed_git *p = item->util;
> +		if (!p)
> +			die(_("could not find pack '%s'"), item->string);
> +	}
> +
>  	/*
>  	 * First handle all of the excluded packs, marking them as kept in-core

...and it may be worth updating this comment with s/First/Then.

> diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
> index 65e991e3706..330deec656b 100755
> --- a/t/t5300-pack-object.sh
> +++ b/t/t5300-pack-object.sh
> @@ -119,6 +119,24 @@ test_expect_success 'pack-object <stdin parsing: [|--revs] with --stdin' '
>  	test_cmp err.expect err.actual
>  '
>
> +test_expect_success 'pack-object <stdin parsing: --stdin-packs handles garbage' '
> +	cat >in <<-EOF &&
> +	$(git -C pack-object-stdin rev-parse one)
> +	$(git -C pack-object-stdin rev-parse two)
> +	EOF

It's not a big deal, but here-doc directly into `git pack-objects` is
much more common in t5300 than first redirecting it to a separate file.
I probably would have written (in a sub-shell to avoid -C
pack-object-stdin everywhere):


  cd pack-object-stdin &&
  test_must_fail git pack-objects --stdout --stdin-packs >/dev/null 2>actual <<-EOF
  $(git rev-parse one)
  $(git rev-parse two)
  EOF

Although the line is kind of long anyway (and it'd be even longer since
the subshell will get its own level of indentation). So I could entirely
buy that you did this for readability, which is fine by me.

> +
> +	# We actually just report the first bad line in strcmp()
> +	# order, it just so happens that we get the same result under
> +	# SHA-1 and SHA-256 here. It does not really matter that we
> +	# report the first bad item in this obscure case, so this
> +	# oddity of the test is OK.
> +	cat >err.expect <<-EOF &&
> +	fatal: could not find pack '"'"'$(git -C pack-object-stdin rev-parse two)'"'"'
> +	EOF
> +	test_must_fail git -C pack-object-stdin pack-objects stdin-with-stdin-option --stdin-packs <in 2>err.actual &&
> +	test_cmp err.expect err.actual

If we don't care which is reported (and it just so happens that we'll
get the first one in lexical order), I would be fine with

    test_i18ngrep "could not find pack" err.actual

too. It would be good to get rid of this comment and put it in the patch
message in more detail (instead of just referring to it as "[a]s noted
in the test".

Thanks,
Taylor
diff mbox series

Patch

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index de00adbb9e0..65579e09fe0 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3310,6 +3310,16 @@  static void read_packs_list_from_stdin(void)
 			item->util = p;
 	}
 
+	/*
+	 * Arguments we got on stdin may not even be packs. Check that
+	 * to avoid segfaulting later on in e.g. pack_mtime_cmp().
+	 */
+	for_each_string_list_item(item, &include_packs) {
+		struct packed_git *p = item->util;
+		if (!p)
+			die(_("could not find pack '%s'"), item->string);
+	}
+
 	/*
 	 * First handle all of the excluded packs, marking them as kept in-core
 	 * so that later calls to add_object_entry() discards any objects that
diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 65e991e3706..330deec656b 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -119,6 +119,24 @@  test_expect_success 'pack-object <stdin parsing: [|--revs] with --stdin' '
 	test_cmp err.expect err.actual
 '
 
+test_expect_success 'pack-object <stdin parsing: --stdin-packs handles garbage' '
+	cat >in <<-EOF &&
+	$(git -C pack-object-stdin rev-parse one)
+	$(git -C pack-object-stdin rev-parse two)
+	EOF
+
+	# We actually just report the first bad line in strcmp()
+	# order, it just so happens that we get the same result under
+	# SHA-1 and SHA-256 here. It does not really matter that we
+	# report the first bad item in this obscure case, so this
+	# oddity of the test is OK.
+	cat >err.expect <<-EOF &&
+	fatal: could not find pack '"'"'$(git -C pack-object-stdin rev-parse two)'"'"'
+	EOF
+	test_must_fail git -C pack-object-stdin pack-objects stdin-with-stdin-option --stdin-packs <in 2>err.actual &&
+	test_cmp err.expect err.actual
+'
+
 # usage: check_deltas <stderr_from_pack_objects> <cmp_op> <nr_deltas>
 # e.g.: check_deltas stderr -gt 0
 check_deltas() {