diff mbox series

merge-file: warn for implicit 'myers' algorithm

Message ID pull.1741.git.git.1720016469254.gitgitgadget@gmail.com (mailing list archive)
State New
Headers show
Series merge-file: warn for implicit 'myers' algorithm | expand

Commit Message

Antonin Delpeuch July 3, 2024, 2:21 p.m. UTC
From: Antonin Delpeuch <antonin@delpeuch.eu>

The current default diff algorithm for the merge-file command is
'myers', despite the default for the 'ort' strategy being 'histogram'.
Since 2.44.0 it is possible to specify a different diff algorithm via
the --diff-algorithm option. As a preparation for changing the default
to 'histogram', we warn the user about the different behaviour this
may cause.

Signed-off-by: Antonin Delpeuch <antonin@delpeuch.eu>
---
    merge-file: warn for implicit 'myers' algorithm

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1741%2Fwetneb%2Fexplicit_diff_algorithm-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1741/wetneb/explicit_diff_algorithm-v1
Pull-Request: https://github.com/git/git/pull/1741

 builtin/merge-file.c  | 8 ++++++++
 t/t6403-merge-file.sh | 5 +++--
 2 files changed, 11 insertions(+), 2 deletions(-)


base-commit: 06e570c0dfb2a2deb64d217db78e2ec21672f558

Comments

Junio C Hamano July 3, 2024, 5:30 p.m. UTC | #1
"Antonin Delpeuch via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Antonin Delpeuch <antonin@delpeuch.eu>
>
> The current default diff algorithm for the merge-file command is
> 'myers', despite the default for the 'ort' strategy being 'histogram'.

It is unclear only from the above description if `ort` should match
by adopting `myers` as its default, or vice versa.  I'll go into
more detail why I think this whole thing is done in a wrong order
later.

> Since 2.44.0 it is possible to specify a different diff algorithm via
> the --diff-algorithm option.

"2.44.0" -> "4f7fd79e (merge-file: add --diff-algorithm option,
2023-11-20)".  Thanks for that patch, by the way.

> As a preparation for changing the default
> to 'histogram', we warn the user about the different behaviour this
> may cause.

There is a huge leap in logic here.  Nobody proposed to change the
default and we had no such discussion here.  That needs to happen
before anybody can warn users that "the default will change".

Once everybody agrees that such a change is a good idea, we'd need
to devise transition plan, and one of the tasks _might_ be to add
this warning to the code, among other things we may do.  The whole
process has to be designed carefully.  Having this step as the first
one is way too suboptimal and hurts the users (e.g. "what if we
decide that using histogram as the default is not what we want to do
in the end?").

> +static int explicit_diff_algorithm = 0;

We shouldn't (and we generally do not) initialize a variable
explicitly to "0".  Just let being in the .bss section take care of
it instead.

This one is flipped by diff_algorithm_cb(), which is a callback
function that deals with the command line option "--diff-algorithm",
so calling the variable to remember the fact that the option was
used with "explicit" in its name is very much appropriate.

Now on to the real problem I have with this patch.

What do you want the end-user experience for those who saw and
understood this warning message to be?  Especially for ones who do
not really _care_ what the default algorithm used is, and would
rather tell us "I'll let Git developers to choose the best algorithm
for us, do not bother me with what exact choice you made---I'll
happily use the built-in default")?

Whether they have their favorite algorithm or they are willing to go
with the built-in default, they will keep getting shown by this
message and there is no obvious way to easily squelch the message.

If we do not count "Every time you run this command, you can give
the --diff-algorithm=myers option from the command line to squelch
it" as a usable piece of advice, that is.

Stepping back a bit.

It is curious that, even though we read the merge.conflictstyle
configuration variable by calling into xdiff-interface.c, we do not
seem to pay attention to the diff.algorithm configuration variable.
Shouldn't we teach the command to do so first, as a follow-up to
your 4f7fd79e (merge-file: add --diff-algorithm option, 2023-11-20)?
If `ort` does not pay attention to it (I do not know if it does),
then perhaps that can also be fixed in the same "preparatory"
series.  It would allow us to have a way to consistently and
uniformly configure the diff algorithm to employ regardless of the
caller of the diff machinery (and if we wanted to go fancy, we could
even introduce "diff.ort.algorithm" and "diff.merge-file.algorithm"
that overrides "diff.algorithm" that in turn overrides whatever the
built-in default is).

Such a change would prepare the codebase to allow users to say "I'll
adopt the new default that will come in Git 3.0 before it happens by
setting diff.algorithm to histogram" or "I'll set diff.algorithm to
default to express that I'll go with the flow and let Git developers
to decide".  With such a preparatory change made, you can build an
equivalent of this step, but you make sure that you pay attention to
both the command line (i.e. your explicit_diff_algorithm) and also
the configuration as ways for users to express that "I've read the
warning.  Please do not repeat it."

So, in short, I do not like this patch because of two reasons:

 - It shouldn't be the first message that begins the topic of
   flipping the default diff algorithm for merge-file.

 - Even if we arrived at a consensus to migrate the default to
   'histogram' (and I do not think we even started discussing), the
   "warning" mechanism that has no easy way to squelch is not an
   acceptable component in the migration plan.
Antonin Delpeuch July 3, 2024, 6:28 p.m. UTC | #2
Hi Junio,

I'm really sorry, I thought the switch of default and migration plan had already been agreed on in our discussion of my earlier patch.
Specifically, you wrote (https://lore.kernel.org/git/xmqq7cmdpbhq.fsf@gitster.g):

    First allow to configure the
    custom algorithm from the command line option (and optionally via a
    configuration variable) and ship it in a release, start giving a
    warning if the using script did not specify the configuration or the
    command line option and used the current default and ship it in the
    next release, wait for a few releases and then finally flip the
    default, or something like that.

So I thought it would be helpful to follow-up with a patch that implements the approach you outlined.
But I totally understand that it might be worth discussing this more.
Actually, I do agree with your assessment that this warning is not great UX.

I think relying on `diff.algorithm` is a natural idea, but it might also be confusing for users.
At least to me, the name `diff.algorithm` suggests that it's the algorithm used for "git diff",
but I might not realize that it also influences how my merges are done.
It's probably common to want different algorithms for those situations as they require different speed and accuracy trade-offs.

In any case, I'm happy to withdraw this patch. Would it be helpful if I start a new thread on the mailing list, independently from this patch, to discuss if and how the default should be switched?

Best,
Antonin
Junio C Hamano July 3, 2024, 8:19 p.m. UTC | #3
Antonin Delpeuch <antonin@delpeuch.eu> writes:

> I'm really sorry, I thought the switch of default and migration
> plan had already been agreed on in our discussion of my earlier
> patch.

Ahh, OK.

So we did some time ago floated the idea.  I do not remember how
widely accepted the proposal was, though.  Having a such reference
and an explicit mention of what we have and not have yet reached
consensus on (either in cover letter or after the three-dash line)
would have been very much helpful.

> I think relying on `diff.algorithm` is a natural idea, but it
> might also be confusing for users.  At least to me, the name
> `diff.algorithm` suggests that it's the algorithm used for "git
> diff", but I might not realize that it also influences how my
> merges are done.  It's probably common to want different
> algorithms for those situations as they require different speed
> and accuracy trade-offs.

I never thought that we should get rid of one-off command line
option.  After all, we started with command line option to support
such one-off tweaks in the earlier 4f7fd79e (merge-file: add
--diff-algorithm option, 2023-11-20) for that exact reason.

The need to have one-off capability is orthogonal to the need to
allow users to choose their own default via the configuration.  We
want to have both, given that some commands other than "git diff"
already honor the `diff.THING` configuration variables.  Doesn't
"git log -p" already pay attention to "diff.algorithm" among other
"diff.THING" variables?

A possible downside I had envisioned was that depending on the
application (i.e. "diff" that produces a patch vs an internal
implementation detail of "merge-file") the users may want to choose
the value of "diff.THING" differently.  But then we can use the
"'diff.THING' is used as the default, but 'diff.frotz.THING', when
defined, overrides the choice inside the 'git frotz' program as a
more specific configuration" pattern.

In any case, honoring things like

    [diff] algorithm = default
    [diff "merge-file"] algorithm = default

in the configuration file might be a reasonable way out to prepare
that users will have a way to squelch the warning messages.

Thanks.
Elijah Newren July 5, 2024, 5:39 p.m. UTC | #4
On Wed, Jul 3, 2024 at 1:24 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Antonin Delpeuch <antonin@delpeuch.eu> writes:
>
> > I'm really sorry, I thought the switch of default and migration
> > plan had already been agreed on in our discussion of my earlier
> > patch.
>
> Ahh, OK.
>
> So we did some time ago floated the idea.  I do not remember how
> widely accepted the proposal was, though.  Having a such reference
> and an explicit mention of what we have and not have yet reached
> consensus on (either in cover letter or after the three-dash line)
> would have been very much helpful.

There's been a few discussions, the other most recent one I remember
was the thread over at
https://lore.kernel.org/git/Y+zzh80fybq8Tn66@coredump.intra.peff.net/.
(And beyond the git community, there's
https://lkml.org/lkml/2023/5/7/206 in the kernel community, and if
others know of discussions in other large developer communities I'd be
interested in links.)

The previous discussions felt to me like we were moving towards
consensus, but while I found that encouraging since I think histogram
would eventually be a better default, I did not make any actual
proposals and try to push further towards consensus because there are
a couple known issues that I think should be fixed before we consider
flipping the default.  I have some work-in-progress that was put on
the backburner a few years ago that I would like to pick up again, and
if successful, investigate how much that helps general cases in a
format that can help people make educated decisions, and then again
float the idea of changing the default.  If consensus is reached, then
we'd change the default across the board -- diff/log/merge-file/etc.
rather than just the somewhat rarely used merge-file.  At least,
that's my current plan in this area; if others think I should
investigate things in a different order or would like to see
additional steps planned into this journey, please do let me know.
Antonin Delpeuch July 5, 2024, 8:14 p.m. UTC | #5
Thanks both.

It sounds like there is a strong case for at least making "merge-file"
honour the `diff.algorithm` config variable, possibly with more
fine-grained config variables as suggested by Junio. I'll try to make a
patch for that.

Elijah, if any of your work on improving the histogram diff can be
shared (for instance by writing down a description of the issues you'd
like to address), I would be interested to have a look and perhaps help
out if doable. But as far as I can tell there seems to be a lot of fans
of the existing algorithm already, and such heuristics can never be
perfect, so hopefully we can make the switch in a not too distant future.

Best,

Antonin
diff mbox series

Patch

diff --git a/builtin/merge-file.c b/builtin/merge-file.c
index 1f987334a31..dce2676415e 100644
--- a/builtin/merge-file.c
+++ b/builtin/merge-file.c
@@ -29,6 +29,8 @@  static int label_cb(const struct option *opt, const char *arg, int unset)
 	return 0;
 }
 
+static int explicit_diff_algorithm = 0;
+
 static int set_diff_algorithm(xpparam_t *xpp,
 			      const char *alg)
 {
@@ -50,6 +52,8 @@  static int diff_algorithm_cb(const struct option *opt,
 		return error(_("option diff-algorithm accepts \"myers\", "
 			       "\"minimal\", \"patience\" and \"histogram\""));
 
+	explicit_diff_algorithm = 1;
+
 	return 0;
 }
 
@@ -103,6 +107,10 @@  int cmd_merge_file(int argc, const char **argv, const char *prefix)
 			return error_errno("failed to redirect stderr to /dev/null");
 	}
 
+	if (!explicit_diff_algorithm) {
+		warning(_("--diff-algorithm not provided, defaulting to \"myers\". This default will change to \"histogram\" in a future version."));
+	}
+
 	if (object_id)
 		setup_git_directory();
 
diff --git a/t/t6403-merge-file.sh b/t/t6403-merge-file.sh
index fb872c5a113..9d0045be955 100755
--- a/t/t6403-merge-file.sh
+++ b/t/t6403-merge-file.sh
@@ -540,8 +540,9 @@  test_expect_success 'merging C files with "myers" diff algorithm creates some sp
 	}
 	EOF
 
-	test_must_fail git merge-file -p --diff3 --diff-algorithm myers ours.c base.c theirs.c >myers_output.c &&
-	test_cmp expect.c myers_output.c
+	test_must_fail git merge-file -p --diff3 ours.c base.c theirs.c >myers_output.c 2> err &&
+	test_cmp expect.c myers_output.c &&
+	grep "diff-algorithm not provided" err
 '
 
 test_expect_success 'merging C files with "histogram" diff algorithm avoids some spurious conflicts' '