diff mbox series

[1/2] range-diff: fix a crash in parsing git-log output

Message ID 2375e34100e571f9c3ce658d28aba6648fba18a6.1586960921.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series range-diff: fix a crash in parsing git-log output | expand

Commit Message

John Passaro via GitGitGadget April 15, 2020, 2:28 p.m. UTC
From: Vasil Dimov <vd@FreeBSD.org>

`git range-diff` calls `git log` internally and tries to parse its
output. But `git log` output can be customized by the user in their
git config and for certain configurations either an error will be
returned by `git range-diff` or it will crash.

To fix this explicitly set the output format of the internally
executed `git log` with `--pretty=medium`. Because that cancels
`--notes`, add explicitly `--notes` at the end.

Also, make sure we never crash in the same way - trying to dereference
`util` which was never created and has remained NULL. It would happen
if the first line of `git log` output does not begin with 'commit '.

Alternative considered but discarded - somehow disable all git configs
and behave as if no config is present in the internally executed
`git log`, but that does not seem to be possible. GIT_CONFIG_NOSYSTEM
is the closest to it, but even with that we would still read
`.git/config`.

Signed-off-by: Vasil Dimov <vd@FreeBSD.org>
---
 range-diff.c          | 13 +++++++++++++
 t/t3206-range-diff.sh | 10 ++++++++++
 2 files changed, 23 insertions(+)

Comments

Junio C Hamano April 15, 2020, 3:31 p.m. UTC | #1
"Vasil Dimov via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Vasil Dimov <vd@FreeBSD.org>
>
> `git range-diff` calls `git log` internally and tries to parse its
> output. But `git log` output can be customized by the user in their
> git config and for certain configurations either an error will be
> returned by `git range-diff` or it will crash.
>
> To fix this explicitly set the output format of the internally
> executed `git log` with `--pretty=medium`. Because that cancels
> `--notes`, add explicitly `--notes` at the end.

Good finding.  

Shouldn't we also disable customizations that come from the
configuration variables like diff.external, diff.<driver>.command?

Thanks.
Taylor Blau April 15, 2020, 4:13 p.m. UTC | #2
Hi Vasil,

Nice find. I have a question below:

On Wed, Apr 15, 2020 at 02:28:40PM +0000, Vasil Dimov via GitGitGadget wrote:
> From: Vasil Dimov <vd@FreeBSD.org>
>
> `git range-diff` calls `git log` internally and tries to parse its
> output. But `git log` output can be customized by the user in their
> git config and for certain configurations either an error will be
> returned by `git range-diff` or it will crash.
>
> To fix this explicitly set the output format of the internally
> executed `git log` with `--pretty=medium`. Because that cancels
> `--notes`, add explicitly `--notes` at the end.
>
> Also, make sure we never crash in the same way - trying to dereference
> `util` which was never created and has remained NULL. It would happen
> if the first line of `git log` output does not begin with 'commit '.
>
> Alternative considered but discarded - somehow disable all git configs
> and behave as if no config is present in the internally executed
> `git log`, but that does not seem to be possible. GIT_CONFIG_NOSYSTEM
> is the closest to it, but even with that we would still read
> `.git/config`.

I don't know of a great way to do this off-hand. Perhaps an internal
`--for-range-diff` option that ignores options that are incompatible
with range-diff's own parsing seems like an OK path forward to me.

Here, internal means that it's not part of the manual page. We can pass
'PARSE_OPT_NONEG' to make sure that a caller can't later overwrite it by
passing '--no-for-range-diff'.

> Signed-off-by: Vasil Dimov <vd@FreeBSD.org>
> ---
>  range-diff.c          | 13 +++++++++++++
>  t/t3206-range-diff.sh | 10 ++++++++++
>  2 files changed, 23 insertions(+)
>
> diff --git a/range-diff.c b/range-diff.c
> index f745567cf67..5cc920be391 100644
> --- a/range-diff.c
> +++ b/range-diff.c
> @@ -63,6 +63,8 @@ static int read_patches(const char *range, struct string_list *list,
>  			"--output-indicator-old=<",
>  			"--output-indicator-context=#",
>  			"--no-abbrev-commit",
> +			"--pretty=medium",
> +			"--notes",
>  			NULL);
>  	if (other_arg)
>  		argv_array_pushv(&cp.args, other_arg->argv);
> @@ -106,6 +108,17 @@ static int read_patches(const char *range, struct string_list *list,
>  			continue;
>  		}
>
> +		if (!util) {
> +			error(_("could not parse first line of `log` output: "
> +				"did not start with 'commit ': '%s'"),
> +			      line);
> +			string_list_clear(list, 1);
> +			strbuf_release(&buf);
> +			strbuf_release(&contents);
> +			finish_command(&cp);
> +			return -1;
> +		}
> +
>  		if (starts_with(line, "diff --git")) {
>  			struct patch patch = { 0 };
>  			struct strbuf root = STRBUF_INIT;
> diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
> index bd808f87ed5..e024cff65cb 100755
> --- a/t/t3206-range-diff.sh
> +++ b/t/t3206-range-diff.sh
> @@ -513,6 +513,16 @@ test_expect_success 'range-diff overrides diff.noprefix internally' '
>  	git -c diff.noprefix=true range-diff HEAD^...
>  '
>
> +test_expect_success 'basic with modified format.pretty with suffix' '
> +	git -c format.pretty="format:commit %H%d%n" range-diff \
> +		master..topic master..unmodified
> +'
> +
> +test_expect_success 'basic with modified format.pretty without "commit "' '
> +	git -c format.pretty="format:%H%n" range-diff \
> +		master..topic master..unmodified
> +'
> +
>  test_expect_success 'range-diff compares notes by default' '
>  	git notes add -m "topic note" topic &&
>  	git notes add -m "unmodified note" unmodified &&
> --
> gitgitgadget

Otherwise what you have here does look good to me, too. I'm just not
sure that there aren't other ways that a caller could circumvent placing
'--notes --pretty=medium' at the end, anyway.

Thanks,
Taylor
Vasil Dimov April 15, 2020, 4:16 p.m. UTC | #3
On Wed, Apr 15, 2020 at 08:31:39 -0700, Junio C Hamano wrote:
> "Vasil Dimov via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Vasil Dimov <vd@FreeBSD.org>
> >
> > `git range-diff` calls `git log` internally and tries to parse its
> > output. But `git log` output can be customized by the user in their
> > git config and for certain configurations either an error will be
> > returned by `git range-diff` or it will crash.
> >
> > To fix this explicitly set the output format of the internally
> > executed `git log` with `--pretty=medium`. Because that cancels
> > `--notes`, add explicitly `--notes` at the end.
>
> Good finding.
>
> Shouldn't we also disable customizations that come from the
> configuration variables like diff.external, diff.<driver>.command?

Hmm, I am not sure. I just read the doc about diff.<driver>.command. Is
it possible that the user has set up some custom diff tools to compare
e.g. .jpg files by only comparing their EXIF data instead of the entire
(binary) files?

Surely if the customizations wrt diff.external and
diff.<driver>.command produce result that is unparsable by
git range-diff, then they are not good. But maybe the opposite is the
case?

I don't feel confident enough to judge.
Jeff King April 15, 2020, 4:23 p.m. UTC | #4
On Wed, Apr 15, 2020 at 08:31:39AM -0700, Junio C Hamano wrote:

> "Vasil Dimov via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
> > From: Vasil Dimov <vd@FreeBSD.org>
> >
> > `git range-diff` calls `git log` internally and tries to parse its
> > output. But `git log` output can be customized by the user in their
> > git config and for certain configurations either an error will be
> > returned by `git range-diff` or it will crash.
> >
> > To fix this explicitly set the output format of the internally
> > executed `git log` with `--pretty=medium`. Because that cancels
> > `--notes`, add explicitly `--notes` at the end.
> 
> Good finding.  
> 
> Shouldn't we also disable customizations that come from the
> configuration variables like diff.external, diff.<driver>.command?

If range-diff were a script, I would say it should be using the
"rev-list | diff-tree --stdin" plumbing under the hood, rather than
"log".

The read_patches() function does let callers pass options to git-log,
but I don't _think_ this is exposed to the user. We only allow a few
--notes options to be passed, and we should be able to apply those to
diff-tree. So converting it to use plumbing might be an option.

Though I think there is another bug:

  $ git rev-list HEAD | git diff-tree --stdin --pretty=medium --notes
  commit 8f3d9f354286745c751374f5f1fcafee6b3f3136
  git: notes.c:1308: format_display_notes: Assertion `display_notes_trees' failed.
  Aborted

Another option would be for range-diff to directly call the
revision-traversal plumbing itself. There may be complications there,
though (or else it would have been done from the outset).

We should fix that assertion regardless, though.

-Peff
Taylor Blau April 15, 2020, 10:02 p.m. UTC | #5
Hi Peff,

On Wed, Apr 15, 2020 at 12:23:26PM -0400, Jeff King wrote:
> On Wed, Apr 15, 2020 at 08:31:39AM -0700, Junio C Hamano wrote:
>
> > "Vasil Dimov via GitGitGadget" <gitgitgadget@gmail.com> writes:
> >
> > > From: Vasil Dimov <vd@FreeBSD.org>
> > >
> > > `git range-diff` calls `git log` internally and tries to parse its
> > > output. But `git log` output can be customized by the user in their
> > > git config and for certain configurations either an error will be
> > > returned by `git range-diff` or it will crash.
> > >
> > > To fix this explicitly set the output format of the internally
> > > executed `git log` with `--pretty=medium`. Because that cancels
> > > `--notes`, add explicitly `--notes` at the end.
> >
> > Good finding.
> >
> > Shouldn't we also disable customizations that come from the
> > configuration variables like diff.external, diff.<driver>.command?
>
> If range-diff were a script, I would say it should be using the
> "rev-list | diff-tree --stdin" plumbing under the hood, rather than
> "log".
>
> The read_patches() function does let callers pass options to git-log,
> but I don't _think_ this is exposed to the user. We only allow a few
> --notes options to be passed, and we should be able to apply those to
> diff-tree. So converting it to use plumbing might be an option.
>
> Though I think there is another bug:
>
>   $ git rev-list HEAD | git diff-tree --stdin --pretty=medium --notes
>   commit 8f3d9f354286745c751374f5f1fcafee6b3f3136
>   git: notes.c:1308: format_display_notes: Assertion `display_notes_trees' failed.
>   Aborted

Nice find. I think that I have a patch addressing this below. Please let
me know what you think:

--- >8 ---

Subject: [PATCH] diff-tree.c: load notes machinery with '--notes'

Since its introduction in 7249e91 (revision.c: support --notes
command-line option, 2011-03-29), combining '--notes' with '--pretty'
causes 'git diff-tree' to fail a runtime assertion:

  $ git rev-list HEAD | git diff-tree --stdin --pretty=medium --notes
  commit 8f3d9f354286745c751374f5f1fcafee6b3f3136
  git: notes.c:1308: format_display_notes: Assertion `display_notes_trees' failed.
  Aborted

This failure is due to diff-tree not calling 'load_display_notes' to
initialize the notes machinery.

Ordinarily, this failure isn't triggered, because it requires passing
both '--notes' and '--pretty'. Specifically, passing '--pretty' sets
'opt->verbose_header', causing 'show_log()' to eventually call
'format_display_notes()', which expects a non-NULL 'display_note_trees'.
Without initializing the notes machinery, 'display_note_trees' remains
NULL, and thus triggers an assertion failure. This doesn't occur without
'--pretty' since we never call 'format_display_notes()' without it.

Fix this by initializing the notes machinery after parsing our options,
and harden this behavior against regression with a test in t4013.

Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/diff-tree.c     | 2 ++
 t/t4013-diff-various.sh | 1 +
 2 files changed, 3 insertions(+)

diff --git a/builtin/diff-tree.c b/builtin/diff-tree.c
index cb9ea79367..17c1cc8c3c 100644
--- a/builtin/diff-tree.c
+++ b/builtin/diff-tree.c
@@ -126,6 +126,8 @@ int cmd_diff_tree(int argc, const char **argv, const char *prefix)

 	precompose_argv(argc, argv);
 	argc = setup_revisions(argc, argv, opt, &s_r_opt);
+	if (opt->show_notes)
+		load_display_notes(&opt->notes_opt);

 	while (--argc > 0) {
 		const char *arg = *++argv;
diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
index dde3f11fec..6ae8cfb271 100755
--- a/t/t4013-diff-various.sh
+++ b/t/t4013-diff-various.sh
@@ -398,6 +398,7 @@ diff --no-index --raw --no-abbrev dir2 dir

 diff-tree --pretty --root --stat --compact-summary initial
 diff-tree --pretty -R --root --stat --compact-summary initial
+diff-tree --pretty --notes initial
 diff-tree --stat --compact-summary initial mode
 diff-tree -R --stat --compact-summary initial mode
 EOF
--
2.26.0.106.g9fadedd637
Jeff King April 15, 2020, 10:29 p.m. UTC | #6
On Wed, Apr 15, 2020 at 04:02:42PM -0600, Taylor Blau wrote:

> Subject: [PATCH] diff-tree.c: load notes machinery with '--notes'
> 
> Since its introduction in 7249e91 (revision.c: support --notes
> command-line option, 2011-03-29), combining '--notes' with '--pretty'
> causes 'git diff-tree' to fail a runtime assertion:
> 
>   $ git rev-list HEAD | git diff-tree --stdin --pretty=medium --notes
>   commit 8f3d9f354286745c751374f5f1fcafee6b3f3136
>   git: notes.c:1308: format_display_notes: Assertion `display_notes_trees' failed.
>   Aborted
> 
> This failure is due to diff-tree not calling 'load_display_notes' to
> initialize the notes machinery.

Yes, I think that's the problem that I saw. And this definitely fixes
that case, but I think there's another related one.

  $ git notes add -m foo
  $ git log -1 --format='%h %N'
  94316974f7 foo
  $ git rev-list -1 HEAD | git diff-tree --stdin --format='%h %N' -s
  94316974f7e27dccc6b87f3946bce5d2fc252dc2 %N

This is true even with your patch. With your patch I can add --notes to
get the right output:

  $ git rev-list -1 HEAD | git diff-tree --stdin --format='%h %N' -s --notes
  94316974f7e27dccc6b87f3946bce5d2fc252dc2 foo

(It's also slightly curious that %h doesn't abbreviate in diff-tree; I
guess this is a side effect of the plumbing having no default abbrev
setting; it may be simplest to just live with it).

> @@ -126,6 +126,8 @@ int cmd_diff_tree(int argc, const char **argv, const char *prefix)
> 
>  	precompose_argv(argc, argv);
>  	argc = setup_revisions(argc, argv, opt, &s_r_opt);
> +	if (opt->show_notes)
> +		load_display_notes(&opt->notes_opt);

In git-log we have the equivalent of these new lines, but just before it
we check the userformat, too:

          memset(&w, 0, sizeof(w));
          userformat_find_requirements(NULL, &w);
  
          if (!rev->show_notes_given && (!rev->pretty_given || w.notes))
                  rev->show_notes = 1;
          if (rev->show_notes)
                  load_display_notes(&rev->notes_opt);

I think we'd want to do the same here. Even though it's plumbing, I
can't think of any reason why somebody would _not_ want notes to be
auto-enabled when they say "%N".

> Ordinarily, this failure isn't triggered, because it requires passing
> both '--notes' and '--pretty'. Specifically, passing '--pretty' sets
> 'opt->verbose_header', causing 'show_log()' to eventually call
> 'format_display_notes()', which expects a non-NULL 'display_note_trees'.
> Without initializing the notes machinery, 'display_note_trees' remains
> NULL, and thus triggers an assertion failure. This doesn't occur without
> '--pretty' since we never call 'format_display_notes()' without it.

It's not just --pretty, of course, but any option that causes us to
actually try to format notes (--format, --oneline, etc).

-Peff
diff mbox series

Patch

diff --git a/range-diff.c b/range-diff.c
index f745567cf67..5cc920be391 100644
--- a/range-diff.c
+++ b/range-diff.c
@@ -63,6 +63,8 @@  static int read_patches(const char *range, struct string_list *list,
 			"--output-indicator-old=<",
 			"--output-indicator-context=#",
 			"--no-abbrev-commit",
+			"--pretty=medium",
+			"--notes",
 			NULL);
 	if (other_arg)
 		argv_array_pushv(&cp.args, other_arg->argv);
@@ -106,6 +108,17 @@  static int read_patches(const char *range, struct string_list *list,
 			continue;
 		}
 
+		if (!util) {
+			error(_("could not parse first line of `log` output: "
+				"did not start with 'commit ': '%s'"),
+			      line);
+			string_list_clear(list, 1);
+			strbuf_release(&buf);
+			strbuf_release(&contents);
+			finish_command(&cp);
+			return -1;
+		}
+
 		if (starts_with(line, "diff --git")) {
 			struct patch patch = { 0 };
 			struct strbuf root = STRBUF_INIT;
diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
index bd808f87ed5..e024cff65cb 100755
--- a/t/t3206-range-diff.sh
+++ b/t/t3206-range-diff.sh
@@ -513,6 +513,16 @@  test_expect_success 'range-diff overrides diff.noprefix internally' '
 	git -c diff.noprefix=true range-diff HEAD^...
 '
 
+test_expect_success 'basic with modified format.pretty with suffix' '
+	git -c format.pretty="format:commit %H%d%n" range-diff \
+		master..topic master..unmodified
+'
+
+test_expect_success 'basic with modified format.pretty without "commit "' '
+	git -c format.pretty="format:%H%n" range-diff \
+		master..topic master..unmodified
+'
+
 test_expect_success 'range-diff compares notes by default' '
 	git notes add -m "topic note" topic &&
 	git notes add -m "unmodified note" unmodified &&