diff mbox series

sequencer: detect author name errors in read_author_script()

Message ID YzqhEcTDwMwa8dQX@coredump.intra.peff.net (mailing list archive)
State New, archived
Headers show
Series sequencer: detect author name errors in read_author_script() | expand

Commit Message

Jeff King Oct. 3, 2022, 8:45 a.m. UTC
On Sun, Oct 02, 2022 at 11:39:16PM -0700, Michael V. Scovetta wrote:

> In commit 2a7d63a2, sequencer.c:912 looks like:
> 912  if (name_i == -2)
> 913      error(_("missing 'GIT_AUTHOR_NAME'"));
> 914  if (email_i == -2)
> 915      error(_("missing 'GIT_AUTHOR_EMAIL'"));
> 916  if (date_i == -2)
> 917      error(_("missing 'GIT_AUTHOR_DATE'"));
> 918  if (date_i < 0 || email_i < 0 || date_i < 0 || err)    <-- date_i
> is referenced here twice
> 919      goto finish;
> 
> I'm fairly sure that line 918 should be:
> 918  if (name_i < 0 || email_i < 0 || date_i < 0 || err)

Agreed, but +cc Phillip as the original author.

> I haven't validated this, but I suspect that if
> `rebase-merge/author-script` contained two GIT_AUTHOR_NAME fields,
> then name_i would be set to -1 (by the error function), but control
> wouldn't flow to finish, but instead to line 920 ( *name =
> kv.items[name_i].util; ) where it would attempt to access memory
> slightly outside of items' memory space.

Correct. It also happens if GIT_AUTHOR_NAME is missing.

> I haven't been able to actually trigger the bug, but strongly suspect
> I'm just not familiar enough with how rebasing works under the covers.

It's a little tricky, because we avoid writing and reading the
author-script file unless necessary. An easy way to need it is to break
with a conflict (which writes it), and then resume with "git rebase
--continue" (which reads it back while committing).

Here's a patch to fix it. Thanks for your report!

-- >8 --
Subject: sequencer: detect author name errors in read_author_script()

As we parse the author-script file, we check for missing or duplicate
lines for GIT_AUTHOR_NAME, etc. But after reading the whole file, our
final error conditional checks "date_i" twice and "name_i" not at all.
This not only leads to us failing to abort, but we may do an
out-of-bounds read on the string_list array.

The bug goes back to 442c36bd08 (am: improve author-script error
reporting, 2018-10-31), though the code was soon after moved to this
spot by bcd33ec25f (add read_author_script() to libgit, 2018-10-31).
It was presmably just a typo in 442c36bd08.

We'll add test coverage for all the error cases here, though only the
GIT_AUTHOR_NAME ones fail (even in a vanilla build they to segfault
consistently, but certainly with SANITIZE=address).

Reported-by: Michael V. Scovetta <michael.scovetta@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
The tests kind of feel like overkill, as this is such a specific
condition and I doubt we'd regress to have the same bug twice. But it
was nice at least to confirm the bug and the fix now.

 sequencer.c                    |  2 +-
 t/t3438-rebase-broken-files.sh | 53 ++++++++++++++++++++++++++++++++++
 2 files changed, 54 insertions(+), 1 deletion(-)
 create mode 100755 t/t3438-rebase-broken-files.sh

Comments

Phillip Wood Oct. 3, 2022, 9:29 a.m. UTC | #1
Hi Peff

On 03/10/2022 09:45, Jeff King wrote:
> On Sun, Oct 02, 2022 at 11:39:16PM -0700, Michael V. Scovetta wrote:
> .
>> In commit 2a7d63a2, sequencer.c:912 looks like:
>> 912  if (name_i == -2)
>> 913      error(_("missing 'GIT_AUTHOR_NAME'"));
>> 914  if (email_i == -2)
>> 915      error(_("missing 'GIT_AUTHOR_EMAIL'"));
>> 916  if (date_i == -2)
>> 917      error(_("missing 'GIT_AUTHOR_DATE'"));
>> 918  if (date_i < 0 || email_i < 0 || date_i < 0 || err)    <-- date_i
>> is referenced here twice
>> 919      goto finish;
>>
>> I'm fairly sure that line 918 should be:
>> 918  if (name_i < 0 || email_i < 0 || date_i < 0 || err)
> 
> Agreed, but +cc Phillip as the original author.

Thanks, I must have got confused as I was typing.

>> I haven't validated this, but I suspect that if
>> `rebase-merge/author-script` contained two GIT_AUTHOR_NAME fields,
>> then name_i would be set to -1 (by the error function), but control
>> wouldn't flow to finish, but instead to line 920 ( *name =
>> kv.items[name_i].util; ) where it would attempt to access memory
>> slightly outside of items' memory space.
> 
> Correct. It also happens if GIT_AUTHOR_NAME is missing.
> 
>> I haven't been able to actually trigger the bug, but strongly suspect
>> I'm just not familiar enough with how rebasing works under the covers.
> 
> It's a little tricky, because we avoid writing and reading the
> author-script file unless necessary. An easy way to need it is to break
> with a conflict (which writes it), and then resume with "git rebase
> --continue" (which reads it back while committing).
> 
> Here's a patch to fix it. Thanks for your report!

Yes thank you for reporting this Michael

> -- >8 --
> Subject: sequencer: detect author name errors in read_author_script()
> 
> As we parse the author-script file, we check for missing or duplicate
> lines for GIT_AUTHOR_NAME, etc. But after reading the whole file, our
> final error conditional checks "date_i" twice and "name_i" not at all.
> This not only leads to us failing to abort, but we may do an
> out-of-bounds read on the string_list array.
> 
> The bug goes back to 442c36bd08 (am: improve author-script error
> reporting, 2018-10-31), though the code was soon after moved to this
> spot by bcd33ec25f (add read_author_script() to libgit, 2018-10-31).
> It was presmably just a typo in 442c36bd08.

s/presmbly/presumably/

> We'll add test coverage for all the error cases here, though only the
> GIT_AUTHOR_NAME ones fail (even in a vanilla build they to segfault

Do you mean "they ought to segfault"?

Thanks for fixing this and going to the trouble of adding some tests.

Phillip

> consistently, but certainly with SANITIZE=address).
> 
> Reported-by: Michael V. Scovetta <michael.scovetta@gmail.com>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> The tests kind of feel like overkill, as this is such a specific
> condition and I doubt we'd regress to have the same bug twice. But it
> was nice at least to confirm the bug and the fix now.
> 
>   sequencer.c                    |  2 +-
>   t/t3438-rebase-broken-files.sh | 53 ++++++++++++++++++++++++++++++++++
>   2 files changed, 54 insertions(+), 1 deletion(-)
>   create mode 100755 t/t3438-rebase-broken-files.sh
> 
> diff --git a/sequencer.c b/sequencer.c
> index d26ede83c4..83e0425b04 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -915,7 +915,7 @@ int read_author_script(const char *path, char **name, char **email, char **date,
>   		error(_("missing 'GIT_AUTHOR_EMAIL'"));
>   	if (date_i == -2)
>   		error(_("missing 'GIT_AUTHOR_DATE'"));
> -	if (date_i < 0 || email_i < 0 || date_i < 0 || err)
> +	if (name_i < 0 || email_i < 0 || date_i < 0 || err)
>   		goto finish;
>   	*name = kv.items[name_i].util;
>   	*email = kv.items[email_i].util;
> diff --git a/t/t3438-rebase-broken-files.sh b/t/t3438-rebase-broken-files.sh
> new file mode 100755
> index 0000000000..e68aac4b36
> --- /dev/null
> +++ b/t/t3438-rebase-broken-files.sh
> @@ -0,0 +1,53 @@
> +#!/bin/sh
> +
> +test_description='rebase behavior when on-disk files are broken'
> +. ./test-lib.sh
> +
> +test_expect_success 'set up conflicting branches' '
> +	test_commit base file &&
> +	git checkout -b branch1 &&
> +	test_commit one file &&
> +	git checkout -b branch2 HEAD^ &&
> +	test_commit two file
> +'
> +
> +check_broken_author () {
> +	title=$1; shift
> +	script=$1; shift
> +
> +	test_expect_success "$title" '
> +		# create conflicted state
> +		test_when_finished "git rebase --abort" &&
> +		git checkout -B tmp branch2 &&
> +		test_must_fail git rebase branch1 &&
> +
> +		# break author-script
> +		'"$script"' &&
> +
> +		# resolving notices broken author-script
> +		echo resolved >file &&
> +		git add file &&
> +		test_must_fail git rebase --continue
> +	'
> +}
> +
> +for item in NAME EMAIL DATE
> +do
> +	check_broken_author "detect missing GIT_AUTHOR_$item" '
> +		grep -v $item .git/rebase-merge/author-script >tmp &&
> +		mv tmp .git/rebase-merge/author-script'
> +done
> +
> +for item in NAME EMAIL DATE
> +do
> +	check_broken_author "detect duplicate GIT_AUTHOR_$item" '
> +		grep -i $item .git/rebase-merge/author-script >tmp &&
> +		cat tmp >>.git/rebase-merge/author-script'
> +done
> +
> +check_broken_author 'unknown key in author-script' '
> +	echo "GIT_AUTHOR_BOGUS=${SQ}whatever${SQ}" \
> +		>>.git/rebase-merge/author-script'
> +
> +
> +test_done
Ævar Arnfjörð Bjarmason Oct. 3, 2022, 9:40 a.m. UTC | #2
On Mon, Oct 03 2022, Jeff King wrote:

> On Sun, Oct 02, 2022 at 11:39:16PM -0700, Michael V. Scovetta wrote:
>
>> In commit 2a7d63a2, sequencer.c:912 looks like:
>> 912  if (name_i == -2)
>> 913      error(_("missing 'GIT_AUTHOR_NAME'"));
>> 914  if (email_i == -2)
>> 915      error(_("missing 'GIT_AUTHOR_EMAIL'"));
>> 916  if (date_i == -2)
>> 917      error(_("missing 'GIT_AUTHOR_DATE'"));
>> 918  if (date_i < 0 || email_i < 0 || date_i < 0 || err)    <-- date_i
>> is referenced here twice
>> 919      goto finish;
>> 
>> I'm fairly sure that line 918 should be:
>> 918  if (name_i < 0 || email_i < 0 || date_i < 0 || err)
>
> Agreed, but +cc Phillip as the original author.
>
>> I haven't validated this, but I suspect that if
>> `rebase-merge/author-script` contained two GIT_AUTHOR_NAME fields,
>> then name_i would be set to -1 (by the error function), but control
>> wouldn't flow to finish, but instead to line 920 ( *name =
>> kv.items[name_i].util; ) where it would attempt to access memory
>> slightly outside of items' memory space.
>
> Correct. It also happens if GIT_AUTHOR_NAME is missing.
>
>> I haven't been able to actually trigger the bug, but strongly suspect
>> I'm just not familiar enough with how rebasing works under the covers.
>
> It's a little tricky, because we avoid writing and reading the
> author-script file unless necessary. An easy way to need it is to break
> with a conflict (which writes it), and then resume with "git rebase
> --continue" (which reads it back while committing).
>
> Here's a patch to fix it. Thanks for your report!
>
> -- >8 --
> Subject: sequencer: detect author name errors in read_author_script()
>
> As we parse the author-script file, we check for missing or duplicate
> lines for GIT_AUTHOR_NAME, etc. But after reading the whole file, our
> final error conditional checks "date_i" twice and "name_i" not at all.
> This not only leads to us failing to abort, but we may do an
> out-of-bounds read on the string_list array.
>
> The bug goes back to 442c36bd08 (am: improve author-script error
> reporting, 2018-10-31), though the code was soon after moved to this
> spot by bcd33ec25f (add read_author_script() to libgit, 2018-10-31).
> It was presmably just a typo in 442c36bd08.
>
> We'll add test coverage for all the error cases here, though only the
> GIT_AUTHOR_NAME ones fail (even in a vanilla build they to segfault
> consistently, but certainly with SANITIZE=address).
>
> Reported-by: Michael V. Scovetta <michael.scovetta@gmail.com>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> The tests kind of feel like overkill, as this is such a specific
> condition and I doubt we'd regress to have the same bug twice. But it
> was nice at least to confirm the bug and the fix now.

Having a regression test never feels like overkill to me :)

>
>  sequencer.c                    |  2 +-
>  t/t3438-rebase-broken-files.sh | 53 ++++++++++++++++++++++++++++++++++
>  2 files changed, 54 insertions(+), 1 deletion(-)
>  create mode 100755 t/t3438-rebase-broken-files.sh
>
> diff --git a/sequencer.c b/sequencer.c
> index d26ede83c4..83e0425b04 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -915,7 +915,7 @@ int read_author_script(const char *path, char **name, char **email, char **date,
>  		error(_("missing 'GIT_AUTHOR_EMAIL'"));
>  	if (date_i == -2)
>  		error(_("missing 'GIT_AUTHOR_DATE'"));
> -	if (date_i < 0 || email_i < 0 || date_i < 0 || err)
> +	if (name_i < 0 || email_i < 0 || date_i < 0 || err)
>  		goto finish;
>  	*name = kv.items[name_i].util;
>  	*email = kv.items[email_i].util;
> diff --git a/t/t3438-rebase-broken-files.sh b/t/t3438-rebase-broken-files.sh
> new file mode 100755
> index 0000000000..e68aac4b36
> --- /dev/null
> +++ b/t/t3438-rebase-broken-files.sh
> @@ -0,0 +1,53 @@
> +#!/bin/sh
> +
> +test_description='rebase behavior when on-disk files are broken'
> +. ./test-lib.sh
> +
> +test_expect_success 'set up conflicting branches' '
> +	test_commit base file &&
> +	git checkout -b branch1 &&
> +	test_commit one file &&
> +	git checkout -b branch2 HEAD^ &&
> +	test_commit two file
> +'
> +
> +check_broken_author () {
> +	title=$1; shift
> +	script=$1; shift
> +
> +	test_expect_success "$title" '
> +		# create conflicted state
> +		test_when_finished "git rebase --abort" &&
> +		git checkout -B tmp branch2 &&
> +		test_must_fail git rebase branch1 &&
> +
> +		# break author-script
> +		'"$script"' &&
> +
> +		# resolving notices broken author-script
> +		echo resolved >file &&
> +		git add file &&
> +		test_must_fail git rebase --continue
> +	'
> +}
> +
> +for item in NAME EMAIL DATE
> +do
> +	check_broken_author "detect missing GIT_AUTHOR_$item" '
> +		grep -v $item .git/rebase-merge/author-script >tmp &&
> +		mv tmp .git/rebase-merge/author-script'
> +done
> +
> +for item in NAME EMAIL DATE
> +do
> +	check_broken_author "detect duplicate GIT_AUTHOR_$item" '
> +		grep -i $item .git/rebase-merge/author-script >tmp &&
> +		cat tmp >>.git/rebase-merge/author-script'
> +done
> +
> +check_broken_author 'unknown key in author-script' '
> +	echo "GIT_AUTHOR_BOGUS=${SQ}whatever${SQ}" \
> +		>>.git/rebase-merge/author-script'
> +
> +
> +test_done

Maybe I just have too much PTSD from dealing with shell quoting issues
with this '"$script"" pattern when you need to pass arguments with
spaces in it, or even quotes. Although it probably won't ever be an
issue here.

But in this case just passing the "script" on stdin nicely avoids any
future quoting issues, maybe this would be good to squash in?:

diff --git a/t/t3438-rebase-broken-files.sh b/t/t3438-rebase-broken-files.sh
index e68aac4b36d..6911e145f08 100755
--- a/t/t3438-rebase-broken-files.sh
+++ b/t/t3438-rebase-broken-files.sh
@@ -12,17 +12,19 @@ test_expect_success 'set up conflicting branches' '
 '
 
 check_broken_author () {
-	title=$1; shift
-	script=$1; shift
+	title=$1 && shift &&
 
-	test_expect_success "$title" '
+	# Avoid quoting issues
+	write_script script.sh &&
+
+	test_expect_success "$title of '$script'" '
 		# create conflicted state
 		test_when_finished "git rebase --abort" &&
 		git checkout -B tmp branch2 &&
 		test_must_fail git rebase branch1 &&
 
-		# break author-script
-		'"$script"' &&
+		./script.sh >tmp &&
+		mv tmp .git/rebase-merge/author-script &&
 
 		# resolving notices broken author-script
 		echo resolved >file &&
@@ -33,21 +35,22 @@ check_broken_author () {
 
 for item in NAME EMAIL DATE
 do
-	check_broken_author "detect missing GIT_AUTHOR_$item" '
-		grep -v $item .git/rebase-merge/author-script >tmp &&
-		mv tmp .git/rebase-merge/author-script'
+	check_broken_author "detect missing GIT_AUTHOR_$item" <<-EOF
+	grep -v $item .git/rebase-merge/author-script
+	EOF
 done
 
 for item in NAME EMAIL DATE
 do
-	check_broken_author "detect duplicate GIT_AUTHOR_$item" '
-		grep -i $item .git/rebase-merge/author-script >tmp &&
-		cat tmp >>.git/rebase-merge/author-script'
+	check_broken_author "detect duplicate GIT_AUTHOR_$item" <<-EOF
+	cat .git/rebase-merge/author-script &&
+	grep -i $item .git/rebase-merge/author-script
+	EOF
 done
 
-check_broken_author 'unknown key in author-script' '
-	echo "GIT_AUTHOR_BOGUS=${SQ}whatever${SQ}" \
-		>>.git/rebase-merge/author-script'
-
+check_broken_author 'unknown key in author-script' <<-EOF
+cat .git/rebase-merge/author-script &&
+echo "GIT_AUTHOR_BOGUS=${SQ}whatever${SQ}"
+EOF
 
 test_done
Junio C Hamano Oct. 3, 2022, 4:34 p.m. UTC | #3
Jeff King <peff@peff.net> writes:

> On Sun, Oct 02, 2022 at 11:39:16PM -0700, Michael V. Scovetta wrote:
>
>> In commit 2a7d63a2, sequencer.c:912 looks like:
>> 912  if (name_i == -2)
>> 913      error(_("missing 'GIT_AUTHOR_NAME'"));
>> 914  if (email_i == -2)
>> 915      error(_("missing 'GIT_AUTHOR_EMAIL'"));
>> 916  if (date_i == -2)
>> 917      error(_("missing 'GIT_AUTHOR_DATE'"));
>> 918  if (date_i < 0 || email_i < 0 || date_i < 0 || err)    <-- date_i
>> is referenced here twice
>> 919      goto finish;
>> 
>> I'm fairly sure that line 918 should be:
>> 918  if (name_i < 0 || email_i < 0 || date_i < 0 || err)
>
> Agreed, but +cc Phillip as the original author.
>
>> I haven't validated this, but I suspect that if
>> `rebase-merge/author-script` contained two GIT_AUTHOR_NAME fields,
>> then name_i would be set to -1 (by the error function), but control
>> wouldn't flow to finish, but instead to line 920 ( *name =
>> kv.items[name_i].util; ) where it would attempt to access memory
>> slightly outside of items' memory space.
>
> Correct. It also happens if GIT_AUTHOR_NAME is missing.
>
>> I haven't been able to actually trigger the bug, but strongly suspect
>> I'm just not familiar enough with how rebasing works under the covers.
>
> It's a little tricky, because we avoid writing and reading the
> author-script file unless necessary. An easy way to need it is to break
> with a conflict (which writes it), and then resume with "git rebase
> --continue" (which reads it back while committing).
>
> Here's a patch to fix it. Thanks for your report!

And thanks for your fix ;-)
Jeff King Oct. 3, 2022, 5:15 p.m. UTC | #4
On Mon, Oct 03, 2022 at 10:29:24AM +0100, Phillip Wood wrote:

> > The bug goes back to 442c36bd08 (am: improve author-script error
> > reporting, 2018-10-31), though the code was soon after moved to this
> > spot by bcd33ec25f (add read_author_script() to libgit, 2018-10-31).
> > It was presmably just a typo in 442c36bd08.
> 
> s/presmbly/presumably/

Yep.

> > We'll add test coverage for all the error cases here, though only the
> > GIT_AUTHOR_NAME ones fail (even in a vanilla build they to segfault
> 
> Do you mean "they ought to segfault"?

I think it's just s/to//. I.e., "even in a vanilla build they segfault
consistently".

I rewrote that sentence a few times trying to make it as not-confusing
as possible, but managed to leave in an editing error anyway. ;)

-Peff
Jeff King Oct. 3, 2022, 5:27 p.m. UTC | #5
On Mon, Oct 03, 2022 at 11:40:52AM +0200, Ævar Arnfjörð Bjarmason wrote:

> > +check_broken_author 'unknown key in author-script' '
> > +	echo "GIT_AUTHOR_BOGUS=${SQ}whatever${SQ}" \
> > +		>>.git/rebase-merge/author-script'
> > +
> > +
> > +test_done
> 
> Maybe I just have too much PTSD from dealing with shell quoting issues
> with this '"$script"" pattern when you need to pass arguments with
> spaces in it, or even quotes. Although it probably won't ever be an
> issue here.

Quoting-wise, it's the exact same as passing the snippet to
test_expect_success that we already do. Which, granted, is full of
horrors, but is well understood within our code base. The "$script" is
evaluated when we pass the snippet to test_expect_success, which sees
the expansion along with the rest of the script. The double-quotes
around it ensure that whitespace is retained.

There is one extra quirk here, which is that it must not end with a
newline, since we stick "&&" on it.

> +	# Avoid quoting issues
> +	write_script script.sh &&
> +
> +	test_expect_success "$title of '$script'" '
>  		# create conflicted state
>  		test_when_finished "git rebase --abort" &&
>  		git checkout -B tmp branch2 &&
>  		test_must_fail git rebase branch1 &&
>  
> -		# break author-script
> -		'"$script"' &&
> +		./script.sh >tmp &&
> +		mv tmp .git/rebase-merge/author-script &&

One of my goals was that the script be expanded inline so that
verbose-mode showed what the test was actually doing. But this hides it
behind a vague "script.sh". Maybe that's not important enough to merit
the admitted ugliness of the loop+function we have.

If we give that up, then a better option is probably to put the
before/after parts in functions, like:

diff --git a/t/t3438-rebase-broken-files.sh b/t/t3438-rebase-broken-files.sh
index e68aac4b36..b92a3ce46b 100755
--- a/t/t3438-rebase-broken-files.sh
+++ b/t/t3438-rebase-broken-files.sh
@@ -11,43 +11,49 @@ test_expect_success 'set up conflicting branches' '
 	test_commit two file
 '
 
-check_broken_author () {
-	title=$1; shift
-	script=$1; shift
-
-	test_expect_success "$title" '
-		# create conflicted state
-		test_when_finished "git rebase --abort" &&
-		git checkout -B tmp branch2 &&
-		test_must_fail git rebase branch1 &&
-
-		# break author-script
-		'"$script"' &&
-
-		# resolving notices broken author-script
-		echo resolved >file &&
-		git add file &&
-		test_must_fail git rebase --continue
-	'
+create_conflict () {
+	test_when_finished "git rebase --abort" &&
+	git checkout -B tmp branch2 &&
+	test_must_fail git rebase branch1
+}
+
+check_resolve_fails () {
+	echo resolved >file &&
+	git add file &&
+	test_must_fail git rebase --continue
 }
 
 for item in NAME EMAIL DATE
 do
-	check_broken_author "detect missing GIT_AUTHOR_$item" '
+	test_expect_success "detect missing GIT_AUTHOR_$item" '
+		create_conflict &&
+
 		grep -v $item .git/rebase-merge/author-script >tmp &&
-		mv tmp .git/rebase-merge/author-script'
+		mv tmp .git/rebase-merge/author-script &&
+
+		check_resolve_fails
+	'
 done
 
 for item in NAME EMAIL DATE
 do
-	check_broken_author "detect duplicate GIT_AUTHOR_$item" '
+	test_expect_success "detect duplicate GIT_AUTHOR_$item" '
+		create_conflict &&
+
 		grep -i $item .git/rebase-merge/author-script >tmp &&
-		cat tmp >>.git/rebase-merge/author-script'
+		cat tmp >>.git/rebase-merge/author-script &&
+
+		check_resolve_fails
+	'
 done
 
-check_broken_author 'unknown key in author-script' '
+test_expect_success 'unknown key in author-script' '
+	create_conflict &&
+
 	echo "GIT_AUTHOR_BOGUS=${SQ}whatever${SQ}" \
-		>>.git/rebase-merge/author-script'
+		>>.git/rebase-merge/author-script &&
 
+	check_resolve_fails
+'
 
 test_done

That makes the boilerplate shorter in the "-v" output but focuses on the
actual modification that breaks the author-script.

-Peff
Jeff King Oct. 3, 2022, 5:35 p.m. UTC | #6
On Mon, Oct 03, 2022 at 04:45:06AM -0400, Jeff King wrote:

> > I haven't been able to actually trigger the bug, but strongly suspect
> > I'm just not familiar enough with how rebasing works under the covers.
> 
> It's a little tricky, because we avoid writing and reading the
> author-script file unless necessary. An easy way to need it is to break
> with a conflict (which writes it), and then resume with "git rebase
> --continue" (which reads it back while committing).
> 
> Here's a patch to fix it. Thanks for your report!

And here's a v2 based on the feedback received. The one-liner fix is the
same, but it tries to be a little less clever when constructing the
tests and has 200% more typo fixes in the commit message. Thanks Phillip
and Ævar for review.

-- >8 --
Subject: [PATCH] sequencer: detect author name errors in read_author_script()

As we parse the author-script file, we check for missing or duplicate
lines for GIT_AUTHOR_NAME, etc. But after reading the whole file, our
final error conditional checks "date_i" twice and "name_i" not at all.
This not only leads to us failing to abort, but we may do an
out-of-bounds read on the string_list array.

The bug goes back to 442c36bd08 (am: improve author-script error
reporting, 2018-10-31), though the code was soon after moved to this
spot by bcd33ec25f (add read_author_script() to libgit, 2018-10-31).
It was presumably just a typo in 442c36bd08.

We'll add test coverage for all the error cases here, though only the
GIT_AUTHOR_NAME ones fail (even in a vanilla build they segfault
consistently, but certainly with SANITIZE=address).

Reported-by: Michael V. Scovetta <michael.scovetta@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 sequencer.c                    |  2 +-
 t/t3438-rebase-broken-files.sh | 59 ++++++++++++++++++++++++++++++++++
 2 files changed, 60 insertions(+), 1 deletion(-)
 create mode 100755 t/t3438-rebase-broken-files.sh

diff --git a/sequencer.c b/sequencer.c
index d26ede83c4..83e0425b04 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -915,7 +915,7 @@ int read_author_script(const char *path, char **name, char **email, char **date,
 		error(_("missing 'GIT_AUTHOR_EMAIL'"));
 	if (date_i == -2)
 		error(_("missing 'GIT_AUTHOR_DATE'"));
-	if (date_i < 0 || email_i < 0 || date_i < 0 || err)
+	if (name_i < 0 || email_i < 0 || date_i < 0 || err)
 		goto finish;
 	*name = kv.items[name_i].util;
 	*email = kv.items[email_i].util;
diff --git a/t/t3438-rebase-broken-files.sh b/t/t3438-rebase-broken-files.sh
new file mode 100755
index 0000000000..b92a3ce46b
--- /dev/null
+++ b/t/t3438-rebase-broken-files.sh
@@ -0,0 +1,59 @@
+#!/bin/sh
+
+test_description='rebase behavior when on-disk files are broken'
+. ./test-lib.sh
+
+test_expect_success 'set up conflicting branches' '
+	test_commit base file &&
+	git checkout -b branch1 &&
+	test_commit one file &&
+	git checkout -b branch2 HEAD^ &&
+	test_commit two file
+'
+
+create_conflict () {
+	test_when_finished "git rebase --abort" &&
+	git checkout -B tmp branch2 &&
+	test_must_fail git rebase branch1
+}
+
+check_resolve_fails () {
+	echo resolved >file &&
+	git add file &&
+	test_must_fail git rebase --continue
+}
+
+for item in NAME EMAIL DATE
+do
+	test_expect_success "detect missing GIT_AUTHOR_$item" '
+		create_conflict &&
+
+		grep -v $item .git/rebase-merge/author-script >tmp &&
+		mv tmp .git/rebase-merge/author-script &&
+
+		check_resolve_fails
+	'
+done
+
+for item in NAME EMAIL DATE
+do
+	test_expect_success "detect duplicate GIT_AUTHOR_$item" '
+		create_conflict &&
+
+		grep -i $item .git/rebase-merge/author-script >tmp &&
+		cat tmp >>.git/rebase-merge/author-script &&
+
+		check_resolve_fails
+	'
+done
+
+test_expect_success 'unknown key in author-script' '
+	create_conflict &&
+
+	echo "GIT_AUTHOR_BOGUS=${SQ}whatever${SQ}" \
+		>>.git/rebase-merge/author-script &&
+
+	check_resolve_fails
+'
+
+test_done
Jeff King Oct. 3, 2022, 5:39 p.m. UTC | #7
On Mon, Oct 03, 2022 at 01:27:37PM -0400, Jeff King wrote:

> -check_broken_author 'unknown key in author-script' '
> +test_expect_success 'unknown key in author-script' '
> +	create_conflict &&
> +
>  	echo "GIT_AUTHOR_BOGUS=${SQ}whatever${SQ}" \
> -		>>.git/rebase-merge/author-script'
> +		>>.git/rebase-merge/author-script &&
>  
> +	check_resolve_fails
> +'
>  
>  test_done
> 
> That makes the boilerplate shorter in the "-v" output but focuses on the
> actual modification that breaks the author-script.

Note that we do still keep the ${SQ} bits here. They're necessary for
the same reason: before and after a snippet is being passed through a
variable. Whereas in yours we'd use stdin. I _do_ like that approach in
general, but it is unlike the rest of the test suite. Maybe it's worth
resurrecting:

  https://lore.kernel.org/git/YHDUg6ZR5vu93kGm@coredump.intra.peff.net/

?

-Peff
Junio C Hamano Oct. 3, 2022, 6:05 p.m. UTC | #8
Jeff King <peff@peff.net> writes:

> On Mon, Oct 03, 2022 at 01:27:37PM -0400, Jeff King wrote:
>
>> -check_broken_author 'unknown key in author-script' '
>> +test_expect_success 'unknown key in author-script' '
>> +	create_conflict &&
>> +
>>  	echo "GIT_AUTHOR_BOGUS=${SQ}whatever${SQ}" \
>> -		>>.git/rebase-merge/author-script'
>> +		>>.git/rebase-merge/author-script &&
>>  
>> +	check_resolve_fails
>> +'
>>  
>>  test_done
>> 
>> That makes the boilerplate shorter in the "-v" output but focuses on the
>> actual modification that breaks the author-script.
>
> Note that we do still keep the ${SQ} bits here. They're necessary for
> the same reason: before and after a snippet is being passed through a
> variable. Whereas in yours we'd use stdin. I _do_ like that approach in
> general, but it is unlike the rest of the test suite. Maybe it's worth
> resurrecting:
>
>   https://lore.kernel.org/git/YHDUg6ZR5vu93kGm@coredump.intra.peff.net/
>
> ?

Yeah, it is indeed quite tempting.
Junio C Hamano Oct. 3, 2022, 6:07 p.m. UTC | #9
Jeff King <peff@peff.net> writes:

Jeff King <peff@peff.net> writes:

> On Mon, Oct 03, 2022 at 04:45:06AM -0400, Jeff King wrote:
>
>> > I haven't been able to actually trigger the bug, but strongly suspect
>> > I'm just not familiar enough with how rebasing works under the covers.
>> 
>> It's a little tricky, because we avoid writing and reading the
>> author-script file unless necessary. An easy way to need it is to break
>> with a conflict (which writes it), and then resume with "git rebase
>> --continue" (which reads it back while committing).
>> 
>> Here's a patch to fix it. Thanks for your report!
>
> And here's a v2 based on the feedback received. The one-liner fix is the
> same, but it tries to be a little less clever when constructing the
> tests and has 200% more typo fixes in the commit message. Thanks Phillip
> and Ævar for review.

The create-conflict & try-resolve pair does make it easier to see
what goes on in these tests.

Will queue.

Thanks.
diff mbox series

Patch

diff --git a/sequencer.c b/sequencer.c
index d26ede83c4..83e0425b04 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -915,7 +915,7 @@  int read_author_script(const char *path, char **name, char **email, char **date,
 		error(_("missing 'GIT_AUTHOR_EMAIL'"));
 	if (date_i == -2)
 		error(_("missing 'GIT_AUTHOR_DATE'"));
-	if (date_i < 0 || email_i < 0 || date_i < 0 || err)
+	if (name_i < 0 || email_i < 0 || date_i < 0 || err)
 		goto finish;
 	*name = kv.items[name_i].util;
 	*email = kv.items[email_i].util;
diff --git a/t/t3438-rebase-broken-files.sh b/t/t3438-rebase-broken-files.sh
new file mode 100755
index 0000000000..e68aac4b36
--- /dev/null
+++ b/t/t3438-rebase-broken-files.sh
@@ -0,0 +1,53 @@ 
+#!/bin/sh
+
+test_description='rebase behavior when on-disk files are broken'
+. ./test-lib.sh
+
+test_expect_success 'set up conflicting branches' '
+	test_commit base file &&
+	git checkout -b branch1 &&
+	test_commit one file &&
+	git checkout -b branch2 HEAD^ &&
+	test_commit two file
+'
+
+check_broken_author () {
+	title=$1; shift
+	script=$1; shift
+
+	test_expect_success "$title" '
+		# create conflicted state
+		test_when_finished "git rebase --abort" &&
+		git checkout -B tmp branch2 &&
+		test_must_fail git rebase branch1 &&
+
+		# break author-script
+		'"$script"' &&
+
+		# resolving notices broken author-script
+		echo resolved >file &&
+		git add file &&
+		test_must_fail git rebase --continue
+	'
+}
+
+for item in NAME EMAIL DATE
+do
+	check_broken_author "detect missing GIT_AUTHOR_$item" '
+		grep -v $item .git/rebase-merge/author-script >tmp &&
+		mv tmp .git/rebase-merge/author-script'
+done
+
+for item in NAME EMAIL DATE
+do
+	check_broken_author "detect duplicate GIT_AUTHOR_$item" '
+		grep -i $item .git/rebase-merge/author-script >tmp &&
+		cat tmp >>.git/rebase-merge/author-script'
+done
+
+check_broken_author 'unknown key in author-script' '
+	echo "GIT_AUTHOR_BOGUS=${SQ}whatever${SQ}" \
+		>>.git/rebase-merge/author-script'
+
+
+test_done