diff mbox series

[v2,1/3] range-diff/format-patch: refactor check for commit range

Message ID 3f21e10f919eead049dc2440a29bb2bed6c99d0d.1611339373.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Range diff with ranges lacking dotdot | expand

Commit Message

Johannes Schindelin Jan. 22, 2021, 6:16 p.m. UTC
From: Johannes Schindelin <johannes.schindelin@gmx.de>

Currently, when called with exactly two arguments, `git range-diff`
tests for a literal `..` in each of the two. Likewise, the argument
provided via `--range-diff` to `git format-patch` is checked in the same
manner.

However, `<commit>^!` is a perfectly valid commit range, equivalent to
`<commit>^..<commit>` according to the `SPECIFYING RANGES` section of
gitrevisions[7].

In preparation for allowing more sophisticated ways to specify commit
ranges, let's refactor the check into its own function.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 builtin/log.c        | 2 +-
 builtin/range-diff.c | 9 +++++----
 revision.c           | 5 +++++
 revision.h           | 7 +++++++
 4 files changed, 18 insertions(+), 5 deletions(-)

Comments

Junio C Hamano Jan. 22, 2021, 8:27 p.m. UTC | #1
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> Currently, when called with exactly two arguments, `git range-diff`
> tests for a literal `..` in each of the two. Likewise, the argument
> provided via `--range-diff` to `git format-patch` is checked in the same
> manner.
>
> However, `<commit>^!` is a perfectly valid commit range, equivalent to
> `<commit>^..<commit>` according to the `SPECIFYING RANGES` section of
> gitrevisions[7].
>
> In preparation for allowing more sophisticated ways to specify commit
> ranges, let's refactor the check into its own function.

I think the sharing between the two makes sense, but the helper
function should make it clear in its name that this is "the kind of
commit range range-diff wants to take".  Among the commit range "git
log" and friends can take, range-diff can take only a subset of it,
and only a subset of it is meaningful to range-diff (e.g. HEAD^@ is
still a commit range you can give to "git log", but it would not
make much sense to give it to range-diff).

Perhaps

    s/specifies_commit_range/is_range_diff_range/

or something.

> diff --git a/revision.h b/revision.h

Move this to range-diff.h, not revision.h which is about true commit
ranges.

> index 086ff10280d..66777c8e60f 100644
> --- a/revision.h
> +++ b/revision.h
> @@ -457,4 +457,11 @@ int rewrite_parents(struct rev_info *revs,
>   */
>  struct commit_list *get_saved_parents(struct rev_info *revs, const struct commit *commit);
>  
> +/*
> + * Determine whether the given argument defines a commit range, e.g. A..B.
> + * Note that this only validates the format but does _not_ parse it, i.e.
> + * it does _not_ look up the specified commits in the local repository.
> + */

And s/defines a commit range/is usable as a range to range-diff/

Thanks.

> +int specifies_commit_range(const char *range);
> +
>  #endif
Uwe Kleine-König Jan. 25, 2021, 7:35 a.m. UTC | #2
On Fri, Jan 22, 2021 at 12:27:10PM -0800, Junio C Hamano wrote:
> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
> 
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > Currently, when called with exactly two arguments, `git range-diff`
> > tests for a literal `..` in each of the two. Likewise, the argument
> > provided via `--range-diff` to `git format-patch` is checked in the same
> > manner.
> >
> > However, `<commit>^!` is a perfectly valid commit range, equivalent to
> > `<commit>^..<commit>` according to the `SPECIFYING RANGES` section of
> > gitrevisions[7].
> >
> > In preparation for allowing more sophisticated ways to specify commit
> > ranges, let's refactor the check into its own function.
> 
> I think the sharing between the two makes sense, but the helper
> function should make it clear in its name that this is "the kind of
> commit range range-diff wants to take".  Among the commit range "git
> log" and friends can take, range-diff can take only a subset of it,
> and only a subset of it is meaningful to range-diff (e.g. HEAD^@ is
> still a commit range you can give to "git log", but it would not
> make much sense to give it to range-diff).

Does it make so little sense to forbid passing HEAD^@ as a range to
range-diff? I can imagine situations where is would make sense, e.g. I
often create customer patch stacks from a set of topic branches using
octopus merge. To compare two of these ^@ might be handy.

My POV is that if it's easy to use the same function (and so the same
set of range descriptors) for git log and git range-diff then do so.
This yields a consistent behaviour which is IMHO better than preventing
people to do things that are considered strange today.

Best regards
Uwe
Junio C Hamano Jan. 25, 2021, 7:24 p.m. UTC | #3
Uwe Kleine-König <u.kleine-koenig@pengutronix.de> writes:

>> > In preparation for allowing more sophisticated ways to specify commit
>> > ranges, let's refactor the check into its own function.
>> 
>> I think the sharing between the two makes sense, but the helper
>> function should make it clear in its name that this is "the kind of
>> commit range range-diff wants to take".  Among the commit range "git
>> log" and friends can take, range-diff can take only a subset of it,
>> and only a subset of it is meaningful to range-diff (e.g. HEAD^@ is
>> still a commit range you can give to "git log", but it would not
>> make much sense to give it to range-diff).
>
> Does it make so little sense to forbid passing HEAD^@ as a range to
> range-diff? I can imagine situations where is would make sense, e.g. I
> often create customer patch stacks from a set of topic branches using
> octopus merge. To compare two of these ^@ might be handy.

You can discuss for each individual syntax of a single-token range
and decide which ones are and are not suitable for range-diff, but I
suspect the reason behind this business started with dot-dot is to
perform a superficial "sanity check" at the command line parser
level before passing them to the revision machinery, and having to
deal with potential errors and/or having to compare unreasonably
large segments of history that the user did not intend.

Also I first thought that the command changes the behaviour, given
two tokens, depending on the shape of these two tokens (i.e. when
they satisfy the "is-range?" we are discussing, they are taken as
two ranges to be compared, and otherwise does something else), but
after revisiting the code and "git help range-diff", it always does
one thing when given 

 (1) one arg: gives a symmetric range and what is to be compared
     is its left and right half,

 (2) two args: each is meant to name a set of commits and these two
     are to be compared) or

 (3) three args: each is meant to name a commit, and the arg1..arg2
     and arg1..arg3 are the ranges to be compared.

so ...

> My POV is that if it's easy to use the same function (and so the same
> set of range descriptors) for git log and git range-diff then do so.
> This yields a consistent behaviour which is IMHO better than preventing
> people to do things that are considered strange today.

... I am OK with that point of view.  It certainly is simpler to
explain to end users.

Having said that, it would make it much harder to implement
efficiently, though.  For example, when your user says

	git range-diff A B

to compare "git log A" (all the way down to the root) and "git log
B" (likewise), you'd certainly optimize the older common part of the
history out, essentially turning it into

	git range-diff A..B B..A

or its moral equivalent

	git range-diff A...B

But you cannot apply such an optimization blindly.  When the user
gives A..B and B..A as two args, you somehow need to notice that 
you shouldn't rewrite it to "A..B...B..A", and for that, you'd still
need some "parsing" of these args.

So, I dunno.  Limiting the second form to only forms that the
implementation does not have to do such optimization would certainly
make it simpler for Dscho to implement ;-)
Uwe Kleine-König Jan. 25, 2021, 9:25 p.m. UTC | #4
Hello Junio,

On Mon, Jan 25, 2021 at 11:24:39AM -0800, Junio C Hamano wrote:
> Uwe Kleine-König <u.kleine-koenig@pengutronix.de> writes:
> 
> >> > In preparation for allowing more sophisticated ways to specify commit
> >> > ranges, let's refactor the check into its own function.
> >> 
> >> I think the sharing between the two makes sense, but the helper
> >> function should make it clear in its name that this is "the kind of
> >> commit range range-diff wants to take".  Among the commit range "git
> >> log" and friends can take, range-diff can take only a subset of it,
> >> and only a subset of it is meaningful to range-diff (e.g. HEAD^@ is
> >> still a commit range you can give to "git log", but it would not
> >> make much sense to give it to range-diff).
> >
> > Does it make so little sense to forbid passing HEAD^@ as a range to
> > range-diff? I can imagine situations where is would make sense, e.g. I
> > often create customer patch stacks from a set of topic branches using
> > octopus merge. To compare two of these ^@ might be handy.
> 
> You can discuss for each individual syntax of a single-token range
> and decide which ones are and are not suitable for range-diff, but I
> suspect the reason behind this business started with dot-dot is to
> perform a superficial "sanity check" at the command line parser
> level before passing them to the revision machinery, and having to
> deal with potential errors and/or having to compare unreasonably
> large segments of history that the user did not intend.
> 
> Also I first thought that the command changes the behaviour, given
> two tokens, depending on the shape of these two tokens (i.e. when
> they satisfy the "is-range?" we are discussing, they are taken as
> two ranges to be compared, and otherwise does something else), but
> after revisiting the code and "git help range-diff", it always does
> one thing when given 
> 
>  (1) one arg: gives a symmetric range and what is to be compared
>      is its left and right half,
> 
>  (2) two args: each is meant to name a set of commits and these two
>      are to be compared) or
> 
>  (3) three args: each is meant to name a commit, and the arg1..arg2
>      and arg1..arg3 are the ranges to be compared.
> 
> so ...
> 
> > My POV is that if it's easy to use the same function (and so the same
> > set of range descriptors) for git log and git range-diff then do so.
> > This yields a consistent behaviour which is IMHO better than preventing
> > people to do things that are considered strange today.
> 
> ... I am OK with that point of view.  It certainly is simpler to
> explain to end users.

It seems you understood my argument :-)

> Having said that, it would make it much harder to implement
> efficiently, though.  For example, when your user says
> 
> 	git range-diff A B
> 
> to compare "git log A" (all the way down to the root) and "git log
> B" (likewise), you'd certainly optimize the older common part of the
> history out, essentially turning it into
> 
> 	git range-diff A..B B..A
> 
> or its moral equivalent
> 
> 	git range-diff A...B
> 
> But you cannot apply such an optimization blindly.  When the user
> gives A..B and B..A as two args, you somehow need to notice that 
> you shouldn't rewrite it to "A..B...B..A", and for that, you'd still
> need some "parsing" of these args.

I agree that for a long history

	git range-diff A B

is an expensive request and I wouldn't invest too many cycles optimizing
it. (And if I'd optimize it, it wouldn't be done using textual
combination of the two strings but by checking if the two ranges
intersect. This way something like

	git range-diff v4.0..v4.6-rc1 v4.0..v4.5.6

and maybe even

	git range-diff v4.0..v4.6-rc1 v4.0-rc1..v4.5.6

would benefit, too. But note I'm not (anymore) familiar with the git
source code, so I don't know if this is easy/sensible to do and I'm just
looking at the problem from an architectural and theoretical POV.)

> So, I dunno.  Limiting the second form to only forms that the
> implementation does not have to do such optimization would certainly
> make it simpler for Dscho to implement ;-)

I don't want to make it more complicated for Dscho, I'm happy if I can
in the near future use range-diff with $rev1^! $ref2^! . So feel free to
ignore me.

Best regards
Uwe
Junio C Hamano Jan. 26, 2021, 7:25 p.m. UTC | #5
Uwe Kleine-König <u.kleine-koenig@pengutronix.de> writes:

>> > My POV is that if it's easy to use the same function (and so the same
>> > set of range descriptors) for git log and git range-diff then do so.
>> > This yields a consistent behaviour which is IMHO better than preventing
>> > people to do things that are considered strange today.
>> 
>> ... I am OK with that point of view.  It certainly is simpler to
>> explain to end users.
>
> It seems you understood my argument :-)

So it seems ;-).

>> Having said that, it would make it much harder to implement
>> efficiently, though.  For example, when your user says
>> 
>> 	git range-diff A B
>> 
>> to compare "git log A" (all the way down to the root) and "git log
>> B" (likewise), you'd certainly optimize the older common part of the
>> history out, essentially turning it into
>> 
>> 	git range-diff A..B B..A
>> 
>> or its moral equivalent
>> 
>> 	git range-diff A...B
>> 
>> But you cannot apply such an optimization blindly.  When the user
>> gives A..B and B..A as two args, you somehow need to notice that 
>> you shouldn't rewrite it to "A..B...B..A", and for that, you'd still
>> need some "parsing" of these args.
>
> I agree that for a long history
>
> 	git range-diff A B
>
> is an expensive request and I wouldn't invest too many cycles optimizing
> it.

Well, your devil's advocate can argue that accepting an input that
can easily make the tool unusable would be irresponsible, though.

And there are two possible ways out:

 (1) declare that optimizing "A B" into "A...B" is way too difficult
     to do in general, and find a good enough way to see if A and B
     individually gives a "range" that should be manageable; or

 (2) invest cycles to optimize, so your users do not have to care.

I think the series takes the former approach, and I find it
understandable.

It is a different matter if the way found and implemented in the
patch is "good enough" to tell if a given argument names a
manageable range.  You said something about "rev^{/^here are
two..dots}" that can be mistaken as a "good enough" range, but it
actually names a revision and all the way down to the root.

> (And if I'd optimize it, it wouldn't be done using textual
> combination of the two strings but by checking if the two ranges
> intersect.

Yup, that is in line with what I mumbled earlier about
setup_revisions() and inspecting the rev_info.pending, I think.

>> So, I dunno.  Limiting the second form to only forms that the
>> implementation does not have to do such optimization would certainly
>> make it simpler for Dscho to implement ;-)
>
> I don't want to make it more complicated for Dscho, I'm happy if I can
> in the near future use range-diff with $rev1^! $ref2^! . So feel free to
> ignore me.
>
> Best regards
> Uwe
diff mbox series

Patch

diff --git a/builtin/log.c b/builtin/log.c
index bd6ff4f9f95..099abdfb7e6 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1680,7 +1680,7 @@  static void infer_range_diff_ranges(struct strbuf *r1,
 				    struct commit *head)
 {
 	const char *head_oid = oid_to_hex(&head->object.oid);
-	int prev_is_range = !!strstr(prev, "..");
+	int prev_is_range = specifies_commit_range(prev);
 
 	if (prev_is_range)
 		strbuf_addstr(r1, prev);
diff --git a/builtin/range-diff.c b/builtin/range-diff.c
index 24c4162f744..89d54158011 100644
--- a/builtin/range-diff.c
+++ b/builtin/range-diff.c
@@ -3,6 +3,7 @@ 
 #include "parse-options.h"
 #include "range-diff.h"
 #include "config.h"
+#include "revision.h"
 
 static const char * const builtin_range_diff_usage[] = {
 N_("git range-diff [<options>] <old-base>..<old-tip> <new-base>..<new-tip>"),
@@ -46,12 +47,12 @@  int cmd_range_diff(int argc, const char **argv, const char *prefix)
 		diffopt.use_color = 1;
 
 	if (argc == 2) {
-		if (!strstr(argv[0], ".."))
-			die(_("no .. in range: '%s'"), argv[0]);
+		if (!specifies_commit_range(argv[0]))
+			die(_("not a commit range: '%s'"), argv[0]);
 		strbuf_addstr(&range1, argv[0]);
 
-		if (!strstr(argv[1], ".."))
-			die(_("no .. in range: '%s'"), argv[1]);
+		if (!specifies_commit_range(argv[1]))
+			die(_("not a commit range: '%s'"), argv[1]);
 		strbuf_addstr(&range2, argv[1]);
 	} else if (argc == 3) {
 		strbuf_addf(&range1, "%s..%s", argv[0], argv[1]);
diff --git a/revision.c b/revision.c
index 9dff845bed6..00675f598a3 100644
--- a/revision.c
+++ b/revision.c
@@ -4206,3 +4206,8 @@  void put_revision_mark(const struct rev_info *revs, const struct commit *commit)
 	fputs(mark, stdout);
 	putchar(' ');
 }
+
+int specifies_commit_range(const char *range)
+{
+	return !!strstr(range, "..");
+}
diff --git a/revision.h b/revision.h
index 086ff10280d..66777c8e60f 100644
--- a/revision.h
+++ b/revision.h
@@ -457,4 +457,11 @@  int rewrite_parents(struct rev_info *revs,
  */
 struct commit_list *get_saved_parents(struct rev_info *revs, const struct commit *commit);
 
+/*
+ * Determine whether the given argument defines a commit range, e.g. A..B.
+ * Note that this only validates the format but does _not_ parse it, i.e.
+ * it does _not_ look up the specified commits in the local repository.
+ */
+int specifies_commit_range(const char *range);
+
 #endif