diff mbox series

[v2] revision: add separate field for "-m" of "diff-index -m"

Message ID 20200829201140.23425-1-sorganov@gmail.com (mailing list archive)
State Superseded
Headers show
Series [v2] revision: add separate field for "-m" of "diff-index -m" | expand

Commit Message

Sergey Organov Aug. 29, 2020, 8:11 p.m. UTC
Historically, in "diff-index -m", "-m" does not mean "do not ignore merges", but
"match missing". Despite this, diff-index abuses 'ignore_merges' field being set
by "-m", that in turn causes more troubles.

Add separate 'diff_index_match_missing' field for diff-index to use and set it
when we encounter "-m" option. This field won't then be cleared when primary
meaning of "-m" is reverted (e.g., by "--no-diff-merges"), nor it will be
affected by future option(s) that might drive 'ignore_merges' field.

Use this new field from diff-lib:do_oneway_diff() instead of abusing
'ignore_merges' field.

Signed-off-by: Sergey Organov <sorganov@gmail.com>
---

v2: rebased from 'maint' onto 'master'

 diff-lib.c | 10 ++--------
 revision.c |  6 ++++++
 revision.h |  1 +
 3 files changed, 9 insertions(+), 8 deletions(-)

Comments

Junio C Hamano Aug. 31, 2020, 4:49 a.m. UTC | #1
Sergey Organov <sorganov@gmail.com> writes:

> Historically, in "diff-index -m", "-m" does not mean "do not ignore merges", but
> "match missing". Despite this, diff-index abuses 'ignore_merges' field being set
> by "-m", that in turn causes more troubles.

"causes more troubles"?  When there is no trouble, and no "more"
trouble, concretely mentioned, it is a quite weak justfiication.

There is no reason to say "historically" here, as it has been like
so from beginning of the time, it still is so and it is relied
upon.  "diff-{files,index,tree}" are about comparing two things, and
not about history (where a "merge" might influence "now we are
showing this commit.  which parent do we compare it with?"), so
giving short-and-sweet "-m" its own meaning that is sensible within
the context of "diff" was and is perfectly sensible thing to do.

What is worth fixing is not "-m" in diff-index means "match missing"
while "-m" in log wants to mean "show merges".  It is that, even both
commands use the same option parsing machinery, and the use of these
two options are mutually exclusive so there is no risk of confusion,
the flag internally used to record the presense of the "em" option is
not named neutrally (e.g. "revs->seen_em_option").

	The "log" family of commands and "diff" family of commands
	share the same command line parsiong machinery.  For the
	former, "-m" means "show merges" while for the latter it
	means "match missing".  Tnis is not a problem at the UI
	level, as "show/not show merges" is meaningless in the
	context of "diff", and similarly "match/not match missing"
	is meaningless in the context of "log".

	But there are two problems with this arrangement.

	1. the field the presense of the option on the command line
	   is recorded in has to be given a name.  It is currently
	   called "ignore_merges", which gives an incorrect
	   impression that using it for "diff" family is somehow a
	   mistake, and renaming it to "match_missing" would not be
	   a solution, as it will give an incorrect impression that
	   "log" family is abusing it.  However, naming the field to
	   something neutral, e.g. "em_option", would make the code
	   harder to understand.

	2. because it uses the same command line parser, giving a
    	   default for "diff -m" in a way that is different from the
    	   default for "log -m" is quite cumbersome if they use the
    	   same field to record it.

	Introduce a separate "match_missing" field, and flip it and
	"ignore_merges" when we see the "-m" option on the command
	line.  That way, even when ignore_merges's default is
	affected by end-user configuration, the default for
	"match_missing" would not be affected.

I think the above would be in line with what you wanted to say but
didn't, and I think it supports the split fairly well.

I have a very strong objection against changing the built-in default
of "log -m", but I do agree that this split of the single field into
two is a fairly good idea.  So I do not want to be in the position
that must reject this change because "log -m" and "diff-index -m"
will never be on by default.  Basing the justification of this
change on end-user configurability would be a good way to sidestep
the issue, and avoids taking this change hostage to the discussion
on what should be the built-in default for "log/diff-index -m".
Sergey Organov Aug. 31, 2020, 12:45 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> Sergey Organov <sorganov@gmail.com> writes:
>
>> Historically, in "diff-index -m", "-m" does not mean "do not ignore
>> merges", but
>> "match missing". Despite this, diff-index abuses 'ignore_merges'
>> field being set
>> by "-m", that in turn causes more troubles.
>
> "causes more troubles"?  When there is no trouble, and no "more"
> trouble, concretely mentioned, it is a quite weak justfiication.

Well, existed comment says "Backward compatibility wart" that sounds
like a trouble to me already. No?

Then, since "--[no-]diff-merges" is introduced, we have:

$ git diff-index HEAD
:100644 000000 4aec621a6d1a9a5892f0b4b6feb2ed329fd04bf2 0000000000000000000000000000000000000000 D	main/main.cc
$ git diff-index -m HEAD
$ git diff-index -m --no-diff-merges HEAD
:100644 000000 4aec621a6d1a9a5892f0b4b6feb2ed329fd04bf2 0000000000000000000000000000000000000000 D	main/main.cc

that sounds like yet another trouble. That's why I used "more trouble" in my
commit message.

If you say "compatibility wart" is not a trouble by itself, -- I'm fine
with it, -- then "more" in my commit message is misplaced indeed.

>
> There is no reason to say "historically" here, as it has been like
> so from beginning of the time, it still is so and it is relied
> upon.  "diff-{files,index,tree}" are about comparing two things, and
> not about history (where a "merge" might influence "now we are
> showing this commit.  which parent do we compare it with?"), so
> giving short-and-sweet "-m" its own meaning that is sensible within
> the context of "diff" was and is perfectly sensible thing to do.

Well, if "historically" makes you feel uncomfortable, -- I'm willing to
get rid of it.

>
> What is worth fixing is not "-m" in diff-index means "match missing"
> while "-m" in log wants to mean "show merges".  It is that, even both
> commands use the same option parsing machinery, and the use of these
> two options are mutually exclusive so there is no risk of confusion,
> the flag internally used to record the presense of the "em" option is
> not named neutrally (e.g. "revs->seen_em_option").
>
> 	The "log" family of commands and "diff" family of commands
> 	share the same command line parsiong machinery.  For the
> 	former, "-m" means "show merges" while for the latter it
> 	means "match missing".  Tnis is not a problem at the UI
> 	level, as "show/not show merges" is meaningless in the
> 	context of "diff", and similarly "match/not match missing"
> 	is meaningless in the context of "log".
>
> 	But there are two problems with this arrangement.
>
> 	1. the field the presense of the option on the command line
> 	   is recorded in has to be given a name.  It is currently
> 	   called "ignore_merges", which gives an incorrect
> 	   impression that using it for "diff" family is somehow a
> 	   mistake, and renaming it to "match_missing" would not be
> 	   a solution, as it will give an incorrect impression that
> 	   "log" family is abusing it.  However, naming the field to
> 	   something neutral, e.g. "em_option", would make the code
> 	   harder to understand.
>
> 	2. because it uses the same command line parser, giving a
>     	   default for "diff -m" in a way that is different from the
>     	   default for "log -m" is quite cumbersome if they use the
>     	   same field to record it.
>
> 	Introduce a separate "match_missing" field, and flip it and
> 	"ignore_merges" when we see the "-m" option on the command
> 	line.  That way, even when ignore_merges's default is
> 	affected by end-user configuration, the default for
> 	"match_missing" would not be affected.
>
> I think the above would be in line with what you wanted to say but
> didn't, and I think it supports the split fairly well.
>
> I have a very strong objection against changing the built-in default
> of "log -m", but I do agree that this split of the single field into
> two is a fairly good idea.  So I do not want to be in the position
> that must reject this change because "log -m" and "diff-index -m"
> will never be on by default.  Basing the justification of this
> change on end-user configurability would be a good way to sidestep
> the issue, and avoids taking this change hostage to the discussion
> on what should be the built-in default for "log/diff-index -m".

This change has nothing to do with defaults. It rather about correct and
clear code.

I'll re-roll with better commit message.

Thanks,
-- Sergey
Junio C Hamano Aug. 31, 2020, 5 p.m. UTC | #3
Sergey Organov <sorganov@gmail.com> writes:

> $ git diff-index -m --no-diff-merges HEAD
> :100644 000000 4aec621a6d1a9a5892f0b4b6feb2ed329fd04bf2 0000000000000000000000000000000000000000 D	main/main.cc

At the first glance, this looked like a good justification for this
patch.

> If you say "compatibility wart" is not a trouble by itself, -- I'm fine
> with it, -- then "more" in my commit message is misplaced indeed.

Yeah, when I wrote the "compatibility wart" comment originally, I
was describing "this needs a tricky code because two independent
options happen to share the command line parser" and nothing more.

I was not reacting to "more", by the way.  I was reacting the lack
of concrete problem description.  "A '-m' option given to the
'diff-index' command can be defeated by giving '--no-diff-merges'
later" you showed above can be a good replacement for "causes more
troubles".

But in the ideal world, "--[no-]diff-merges" should be rejected as
an irrelevant/unrecognised option to the "diff" family of commands
(as I said in the message you are responding to, it is only relevant
to the "log" family of commands where the diff machinery is solely
to compare between (some of) its parents and in that context, what,
if anything, kind of special treatment is made for merge commits
makes sense as an optional instruction to the command).  Splitting
the field into two fields, setting both fields upon "-m" but
toggling only one with longhand "--[no-]diff-merges" would allow the
code to notice and make the above command line silently turn the
"--[no-]diff-merges" into a no-op, so in that sense it would be a
good first step, but an ideal solution would probably need to know
if we are parsing for the "log" family or for the "diff" family and
error out upon seeing a "log"-only option like "--[no-]diff-merges"
when checking the command line option for "diff".

> This change has nothing to do with defaults. It rather about correct and
> clear code.

OK, I misread your intention.  Sorry about that.

Thanks.
diff mbox series

Patch

diff --git a/diff-lib.c b/diff-lib.c
index 50521e2093fc..f2aee78e7aa2 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -405,14 +405,8 @@  static void do_oneway_diff(struct unpack_trees_options *o,
 	/* if the entry is not checked out, don't examine work tree */
 	cached = o->index_only ||
 		(idx && ((idx->ce_flags & CE_VALID) || ce_skip_worktree(idx)));
-	/*
-	 * Backward compatibility wart - "diff-index -m" does
-	 * not mean "do not ignore merges", but "match_missing".
-	 *
-	 * But with the revision flag parsing, that's found in
-	 * "!revs->ignore_merges".
-	 */
-	match_missing = !revs->ignore_merges;
+
+	match_missing = revs->diff_index_match_missing;
 
 	if (cached && idx && ce_stage(idx)) {
 		struct diff_filepair *pair;
diff --git a/revision.c b/revision.c
index 96630e31867d..64b16f7d1033 100644
--- a/revision.c
+++ b/revision.c
@@ -2345,6 +2345,12 @@  static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 		revs->diffopt.flags.tree_in_recursive = 1;
 	} else if (!strcmp(arg, "-m")) {
 		revs->ignore_merges = 0;
+		/*
+		 * Backward compatibility wart - "diff-index -m" does
+		 * not mean "do not ignore merges", but "match_missing",
+		 * so set separate flag for it.
+		 */
+		revs->diff_index_match_missing = 1;
 	} else if ((argcount = parse_long_opt("diff-merges", argv, &optarg))) {
 		if (!strcmp(optarg, "off")) {
 			revs->ignore_merges = 1;
diff --git a/revision.h b/revision.h
index c1e5bcf139d7..5ae8254ffaed 100644
--- a/revision.h
+++ b/revision.h
@@ -188,6 +188,7 @@  struct rev_info {
 	unsigned int	diff:1,
 			full_diff:1,
 			show_root_diff:1,
+			diff_index_match_missing:1,
 			no_commit_id:1,
 			verbose_header:1,
 			combine_merges:1,