[RFC,2/4] range-diff: don't remove funcname from inner diff
diff mbox series

Message ID 20190414210933.20875-3-t.gummerer@gmail.com
State New
Headers show
Series
  • output improvements for git range-diff
Related show

Commit Message

Thomas Gummerer April 14, 2019, 9:09 p.m. UTC
When postprocessing the inner diff in range-diff, we currently replace
the whole hunk header line with just "@@".  This matches how 'git
tbdiff' used to handle hunk headers as well.

Most likely this is being done because line numbers in the hunk header
are not relevant without other changes.  They can for example easily
change if a range is rebased, and lines are added/removed before a
change that we actually care about in our ranges.

However it can still be useful to have the function name that 'git
diff' extracts as additional context for the change.

Note that it is not guaranteed that the hunk header actually shows up
in the range-diff, and this change only aims to improve the case where
a hunk header would already be included in the final output.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 range-diff.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

Comments

Eric Sunshine April 14, 2019, 11:21 p.m. UTC | #1
On Sun, Apr 14, 2019 at 5:09 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
> [...]
> However it can still be useful to have the function name that 'git
> diff' extracts as additional context for the change.
> [...]
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> ---
> diff --git a/range-diff.c b/range-diff.c
> @@ -102,9 +102,12 @@ static int read_patches(const char *range, struct string_list *list)
> +               } else if (starts_with(line.buf, "@@ ")) {
> +                       char *skip_lineno = strstr(line.buf + 3, "@@");
> +                       strbuf_remove(&line, 0, skip_lineno - line.buf);

It makes me a bit uncomfortable that this is not checking for NULL
return from strstr() before doing pointer arithmetic (even though the
input is presumably machine-generated).

    if (!skip_lineno)
        BUG(...);

might be appropriate.

> +                       strbuf_addch(&buf, ' ');
> +                       strbuf_addbuf(&buf, &line);
Johannes Schindelin April 15, 2019, 12:53 p.m. UTC | #2
Hi Thomas,


On Sun, 14 Apr 2019, Thomas Gummerer wrote:

> When postprocessing the inner diff in range-diff, we currently replace
> the whole hunk header line with just "@@".  This matches how 'git
> tbdiff' used to handle hunk headers as well.
>
> Most likely this is being done because line numbers in the hunk header
> are not relevant without other changes.  They can for example easily
> change if a range is rebased, and lines are added/removed before a
> change that we actually care about in our ranges.
>
> However it can still be useful to have the function name that 'git
> diff' extracts as additional context for the change.
>
> Note that it is not guaranteed that the hunk header actually shows up
> in the range-diff, and this change only aims to improve the case where
> a hunk header would already be included in the final output.

Makes sense.

> diff --git a/range-diff.c b/range-diff.c
> index 9242b8975f..f365141ade 100644
> --- a/range-diff.c
> +++ b/range-diff.c
> @@ -102,9 +102,12 @@ static int read_patches(const char *range, struct string_list *list)
>  				strbuf_addch(&buf, '\n');
>  			}
>  			continue;
> -		} else if (starts_with(line.buf, "@@ "))
> -			strbuf_addstr(&buf, "@@");
> -		else if (!line.buf[0] || starts_with(line.buf, "index "))
> +		} else if (starts_with(line.buf, "@@ ")) {
> +			char *skip_lineno = strstr(line.buf + 3, "@@");

Rather than using the magic constant "3", it would probably make sense to
declare `skip_lineno` outside of the `if` construct, and use
`skip_prefix(line.buf, "@@ ", &skip_lineno)` instead of
`starts_with(...)`.

We *will*, however, want to have a safeguard against `strstr()` not
finding anything. Maybe re-use the `p` variable that we already have, and
do this instead:

		} else if (skip_prefix(line.buf, "@@ ", &p) &&
			   (p = strstr(p, "@@"))) {

> +			strbuf_remove(&line, 0, skip_lineno - line.buf);
> +			strbuf_addch(&buf, ' ');

Shorter:

			strbuf_splice(&line, 0, p - line.buf, " ", 1);

(assuming that you accept my suggestion to use `p` instead of
`skip_lineno`...)

Thanks,
Dscho

> +			strbuf_addbuf(&buf, &line);
> +		} else if (!line.buf[0] || starts_with(line.buf, "index "))
>  			/*
>  			 * A completely blank (not ' \n', which is context)
>  			 * line is not valid in a diff.  We skip it
> --
> 2.21.0.593.g511ec345e1
>
>
Johannes Schindelin April 15, 2019, 12:54 p.m. UTC | #3
Hi Eric,

On Sun, 14 Apr 2019, Eric Sunshine wrote:

> On Sun, Apr 14, 2019 at 5:09 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
> > [...]
> > However it can still be useful to have the function name that 'git
> > diff' extracts as additional context for the change.
> > [...]
> > Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> > ---
> > diff --git a/range-diff.c b/range-diff.c
> > @@ -102,9 +102,12 @@ static int read_patches(const char *range, struct string_list *list)
> > +               } else if (starts_with(line.buf, "@@ ")) {
> > +                       char *skip_lineno = strstr(line.buf + 3, "@@");
> > +                       strbuf_remove(&line, 0, skip_lineno - line.buf);
>
> It makes me a bit uncomfortable that this is not checking for NULL
> return from strstr() before doing pointer arithmetic (even though the
> input is presumably machine-generated).
>
>     if (!skip_lineno)
>         BUG(...);

Good point, but maybe we should not go so far as to declare this a bug,
and fall back to removing everything bug the initial two `at` characters
instead?

Thanks,
Dscho

>
> might be appropriate.
>
> > +                       strbuf_addch(&buf, ' ');
> > +                       strbuf_addbuf(&buf, &line);
>
Thomas Gummerer April 15, 2019, 6:56 p.m. UTC | #4
On 04/15, Johannes Schindelin wrote:
> Hi Eric,
> 
> On Sun, 14 Apr 2019, Eric Sunshine wrote:
> 
> > On Sun, Apr 14, 2019 at 5:09 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
> > > [...]
> > > However it can still be useful to have the function name that 'git
> > > diff' extracts as additional context for the change.
> > > [...]
> > > Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> > > ---
> > > diff --git a/range-diff.c b/range-diff.c
> > > @@ -102,9 +102,12 @@ static int read_patches(const char *range, struct string_list *list)
> > > +               } else if (starts_with(line.buf, "@@ ")) {
> > > +                       char *skip_lineno = strstr(line.buf + 3, "@@");
> > > +                       strbuf_remove(&line, 0, skip_lineno - line.buf);
> >
> > It makes me a bit uncomfortable that this is not checking for NULL
> > return from strstr() before doing pointer arithmetic (even though the
> > input is presumably machine-generated).
> >
> >     if (!skip_lineno)
> >         BUG(...);
> 
> Good point, but maybe we should not go so far as to declare this a bug,
> and fall back to removing everything bug the initial two `at` characters
> instead?

I like declaring this a bug.  We are after all parsing
machine-generated output, that does come from git (which is why I
neglected the NULL checking in the first place).  If that second "@@"
is not there it's definitely a bug somewhere in the diff machinery,
I'd say.

Note that the "@@" also couldn't come from anywhere else, the diff
header has a well defined format and so does the metadata.  The diff
itself is prefixed with '<', '>' and '#' in this case, and the commit
message is also prefixed with four spaces.  So if this breaks
somewhere I'd rather hear about it loudly, than let users potentially
get wrong output because we missed something somewhere.

> >
> > might be appropriate.
> >
> > > +                       strbuf_addch(&buf, ' ');
> > > +                       strbuf_addbuf(&buf, &line);
> >
Thomas Gummerer April 15, 2019, 6:57 p.m. UTC | #5
On 04/15, Johannes Schindelin wrote:
> Hi Thomas,
> 
> 
> On Sun, 14 Apr 2019, Thomas Gummerer wrote:
> > diff --git a/range-diff.c b/range-diff.c
> > index 9242b8975f..f365141ade 100644
> > --- a/range-diff.c
> > +++ b/range-diff.c
> > @@ -102,9 +102,12 @@ static int read_patches(const char *range, struct string_list *list)
> >  				strbuf_addch(&buf, '\n');
> >  			}
> >  			continue;
> > -		} else if (starts_with(line.buf, "@@ "))
> > -			strbuf_addstr(&buf, "@@");
> > -		else if (!line.buf[0] || starts_with(line.buf, "index "))
> > +		} else if (starts_with(line.buf, "@@ ")) {
> > +			char *skip_lineno = strstr(line.buf + 3, "@@");
> 
> Rather than using the magic constant "3", it would probably make sense to
> declare `skip_lineno` outside of the `if` construct, and use
> `skip_prefix(line.buf, "@@ ", &skip_lineno)` instead of
> `starts_with(...)`.

I like this suggestion.

> We *will*, however, want to have a safeguard against `strstr()` not
> finding anything. Maybe re-use the `p` variable that we already have, and
> do this instead:
> 
> 		} else if (skip_prefix(line.buf, "@@ ", &p) &&
> 			   (p = strstr(p, "@@"))) {
> 
> > +			strbuf_remove(&line, 0, skip_lineno - line.buf);
> > +			strbuf_addch(&buf, ' ');
> 
> Shorter:
> 
> 			strbuf_splice(&line, 0, p - line.buf, " ", 1);
> 
> (assuming that you accept my suggestion to use `p` instead of
> `skip_lineno`...)

And this is also much nicer, thanks!

> Thanks,
> Dscho
> 
> > +			strbuf_addbuf(&buf, &line);
> > +		} else if (!line.buf[0] || starts_with(line.buf, "index "))
> >  			/*
> >  			 * A completely blank (not ' \n', which is context)
> >  			 * line is not valid in a diff.  We skip it
> > --
> > 2.21.0.593.g511ec345e1
> >
> >
Johannes Schindelin April 17, 2019, 11:52 a.m. UTC | #6
Hi Thomas,

On Mon, 15 Apr 2019, Thomas Gummerer wrote:

> On 04/15, Johannes Schindelin wrote:
>
> > On Sun, 14 Apr 2019, Eric Sunshine wrote:
> >
> > > On Sun, Apr 14, 2019 at 5:09 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
> > > > [...]
> > > > However it can still be useful to have the function name that 'git
> > > > diff' extracts as additional context for the change.
> > > > [...]
> > > > Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> > > > ---
> > > > diff --git a/range-diff.c b/range-diff.c
> > > > @@ -102,9 +102,12 @@ static int read_patches(const char *range, struct string_list *list)
> > > > +               } else if (starts_with(line.buf, "@@ ")) {
> > > > +                       char *skip_lineno = strstr(line.buf + 3, "@@");
> > > > +                       strbuf_remove(&line, 0, skip_lineno - line.buf);
> > >
> > > It makes me a bit uncomfortable that this is not checking for NULL
> > > return from strstr() before doing pointer arithmetic (even though the
> > > input is presumably machine-generated).
> > >
> > >     if (!skip_lineno)
> > >         BUG(...);
> >
> > Good point, but maybe we should not go so far as to declare this a bug,
> > and fall back to removing everything bug the initial two `at` characters
> > instead?
>
> I like declaring this a bug.  We are after all parsing
> machine-generated output, that does come from git (which is why I
> neglected the NULL checking in the first place).  If that second "@@"
> is not there it's definitely a bug somewhere in the diff machinery,
> I'd say.

Ah, but you do know about the micro-project I proposed to optionally feed
an mbox to `range-diff`, right?

The idea behind my proposal is that this would make it possible to
generate a range-diff between the patches on public-inbox and the commits
that actually made it into Junio's `pu`...

> Note that the "@@" also couldn't come from anywhere else, the diff
> header has a well defined format and so does the metadata.  The diff
> itself is prefixed with '<', '>' and '#' in this case, and the commit
> message is also prefixed with four spaces.  So if this breaks
> somewhere I'd rather hear about it loudly, than let users potentially
> get wrong output because we missed something somewhere.

Agreed. But I could imagine that `die()`ing here would be the more
appropriate way to holler loudly ;-)

Ciao,
Dscho
Thomas Gummerer April 24, 2019, 6:15 p.m. UTC | #7
On 04/17, Johannes Schindelin wrote:
> Hi Thomas,
> 
> On Mon, 15 Apr 2019, Thomas Gummerer wrote:
> > I like declaring this a bug.  We are after all parsing
> > machine-generated output, that does come from git (which is why I
> > neglected the NULL checking in the first place).  If that second "@@"
> > is not there it's definitely a bug somewhere in the diff machinery,
> > I'd say.
> 
> Ah, but you do know about the micro-project I proposed to optionally feed
> an mbox to `range-diff`, right?
> 
> The idea behind my proposal is that this would make it possible to
> generate a range-diff between the patches on public-inbox and the commits
> that actually made it into Junio's `pu`...

I had forgotten about that, and was only looking at what the code
currently does.

> > Note that the "@@" also couldn't come from anywhere else, the diff
> > header has a well defined format and so does the metadata.  The diff
> > itself is prefixed with '<', '>' and '#' in this case, and the commit
> > message is also prefixed with four spaces.  So if this breaks
> > somewhere I'd rather hear about it loudly, than let users potentially
> > get wrong output because we missed something somewhere.
> 
> Agreed. But I could imagine that `die()`ing here would be the more
> appropriate way to holler loudly ;-)

Yup, in the light of the potential microproject, I'm fine with just
'die()'ing here.  If we'd get into this situation right now, there
would probably be a bunch of other tests failing as well, so we might
as well make it a little more future proof.  Thanks.

Patch
diff mbox series

diff --git a/range-diff.c b/range-diff.c
index 9242b8975f..f365141ade 100644
--- a/range-diff.c
+++ b/range-diff.c
@@ -102,9 +102,12 @@  static int read_patches(const char *range, struct string_list *list)
 				strbuf_addch(&buf, '\n');
 			}
 			continue;
-		} else if (starts_with(line.buf, "@@ "))
-			strbuf_addstr(&buf, "@@");
-		else if (!line.buf[0] || starts_with(line.buf, "index "))
+		} else if (starts_with(line.buf, "@@ ")) {
+			char *skip_lineno = strstr(line.buf + 3, "@@");
+			strbuf_remove(&line, 0, skip_lineno - line.buf);
+			strbuf_addch(&buf, ' ');
+			strbuf_addbuf(&buf, &line);
+		} else if (!line.buf[0] || starts_with(line.buf, "index "))
 			/*
 			 * A completely blank (not ' \n', which is context)
 			 * line is not valid in a diff.  We skip it