Message ID | pull.1741.git.git.1720016469254.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | merge-file: warn for implicit 'myers' algorithm | expand |
"Antonin Delpeuch via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Antonin Delpeuch <antonin@delpeuch.eu> > > The current default diff algorithm for the merge-file command is > 'myers', despite the default for the 'ort' strategy being 'histogram'. It is unclear only from the above description if `ort` should match by adopting `myers` as its default, or vice versa. I'll go into more detail why I think this whole thing is done in a wrong order later. > Since 2.44.0 it is possible to specify a different diff algorithm via > the --diff-algorithm option. "2.44.0" -> "4f7fd79e (merge-file: add --diff-algorithm option, 2023-11-20)". Thanks for that patch, by the way. > As a preparation for changing the default > to 'histogram', we warn the user about the different behaviour this > may cause. There is a huge leap in logic here. Nobody proposed to change the default and we had no such discussion here. That needs to happen before anybody can warn users that "the default will change". Once everybody agrees that such a change is a good idea, we'd need to devise transition plan, and one of the tasks _might_ be to add this warning to the code, among other things we may do. The whole process has to be designed carefully. Having this step as the first one is way too suboptimal and hurts the users (e.g. "what if we decide that using histogram as the default is not what we want to do in the end?"). > +static int explicit_diff_algorithm = 0; We shouldn't (and we generally do not) initialize a variable explicitly to "0". Just let being in the .bss section take care of it instead. This one is flipped by diff_algorithm_cb(), which is a callback function that deals with the command line option "--diff-algorithm", so calling the variable to remember the fact that the option was used with "explicit" in its name is very much appropriate. Now on to the real problem I have with this patch. What do you want the end-user experience for those who saw and understood this warning message to be? Especially for ones who do not really _care_ what the default algorithm used is, and would rather tell us "I'll let Git developers to choose the best algorithm for us, do not bother me with what exact choice you made---I'll happily use the built-in default")? Whether they have their favorite algorithm or they are willing to go with the built-in default, they will keep getting shown by this message and there is no obvious way to easily squelch the message. If we do not count "Every time you run this command, you can give the --diff-algorithm=myers option from the command line to squelch it" as a usable piece of advice, that is. Stepping back a bit. It is curious that, even though we read the merge.conflictstyle configuration variable by calling into xdiff-interface.c, we do not seem to pay attention to the diff.algorithm configuration variable. Shouldn't we teach the command to do so first, as a follow-up to your 4f7fd79e (merge-file: add --diff-algorithm option, 2023-11-20)? If `ort` does not pay attention to it (I do not know if it does), then perhaps that can also be fixed in the same "preparatory" series. It would allow us to have a way to consistently and uniformly configure the diff algorithm to employ regardless of the caller of the diff machinery (and if we wanted to go fancy, we could even introduce "diff.ort.algorithm" and "diff.merge-file.algorithm" that overrides "diff.algorithm" that in turn overrides whatever the built-in default is). Such a change would prepare the codebase to allow users to say "I'll adopt the new default that will come in Git 3.0 before it happens by setting diff.algorithm to histogram" or "I'll set diff.algorithm to default to express that I'll go with the flow and let Git developers to decide". With such a preparatory change made, you can build an equivalent of this step, but you make sure that you pay attention to both the command line (i.e. your explicit_diff_algorithm) and also the configuration as ways for users to express that "I've read the warning. Please do not repeat it." So, in short, I do not like this patch because of two reasons: - It shouldn't be the first message that begins the topic of flipping the default diff algorithm for merge-file. - Even if we arrived at a consensus to migrate the default to 'histogram' (and I do not think we even started discussing), the "warning" mechanism that has no easy way to squelch is not an acceptable component in the migration plan.
Hi Junio, I'm really sorry, I thought the switch of default and migration plan had already been agreed on in our discussion of my earlier patch. Specifically, you wrote (https://lore.kernel.org/git/xmqq7cmdpbhq.fsf@gitster.g): First allow to configure the custom algorithm from the command line option (and optionally via a configuration variable) and ship it in a release, start giving a warning if the using script did not specify the configuration or the command line option and used the current default and ship it in the next release, wait for a few releases and then finally flip the default, or something like that. So I thought it would be helpful to follow-up with a patch that implements the approach you outlined. But I totally understand that it might be worth discussing this more. Actually, I do agree with your assessment that this warning is not great UX. I think relying on `diff.algorithm` is a natural idea, but it might also be confusing for users. At least to me, the name `diff.algorithm` suggests that it's the algorithm used for "git diff", but I might not realize that it also influences how my merges are done. It's probably common to want different algorithms for those situations as they require different speed and accuracy trade-offs. In any case, I'm happy to withdraw this patch. Would it be helpful if I start a new thread on the mailing list, independently from this patch, to discuss if and how the default should be switched? Best, Antonin
Antonin Delpeuch <antonin@delpeuch.eu> writes: > I'm really sorry, I thought the switch of default and migration > plan had already been agreed on in our discussion of my earlier > patch. Ahh, OK. So we did some time ago floated the idea. I do not remember how widely accepted the proposal was, though. Having a such reference and an explicit mention of what we have and not have yet reached consensus on (either in cover letter or after the three-dash line) would have been very much helpful. > I think relying on `diff.algorithm` is a natural idea, but it > might also be confusing for users. At least to me, the name > `diff.algorithm` suggests that it's the algorithm used for "git > diff", but I might not realize that it also influences how my > merges are done. It's probably common to want different > algorithms for those situations as they require different speed > and accuracy trade-offs. I never thought that we should get rid of one-off command line option. After all, we started with command line option to support such one-off tweaks in the earlier 4f7fd79e (merge-file: add --diff-algorithm option, 2023-11-20) for that exact reason. The need to have one-off capability is orthogonal to the need to allow users to choose their own default via the configuration. We want to have both, given that some commands other than "git diff" already honor the `diff.THING` configuration variables. Doesn't "git log -p" already pay attention to "diff.algorithm" among other "diff.THING" variables? A possible downside I had envisioned was that depending on the application (i.e. "diff" that produces a patch vs an internal implementation detail of "merge-file") the users may want to choose the value of "diff.THING" differently. But then we can use the "'diff.THING' is used as the default, but 'diff.frotz.THING', when defined, overrides the choice inside the 'git frotz' program as a more specific configuration" pattern. In any case, honoring things like [diff] algorithm = default [diff "merge-file"] algorithm = default in the configuration file might be a reasonable way out to prepare that users will have a way to squelch the warning messages. Thanks.
On Wed, Jul 3, 2024 at 1:24 PM Junio C Hamano <gitster@pobox.com> wrote: > > Antonin Delpeuch <antonin@delpeuch.eu> writes: > > > I'm really sorry, I thought the switch of default and migration > > plan had already been agreed on in our discussion of my earlier > > patch. > > Ahh, OK. > > So we did some time ago floated the idea. I do not remember how > widely accepted the proposal was, though. Having a such reference > and an explicit mention of what we have and not have yet reached > consensus on (either in cover letter or after the three-dash line) > would have been very much helpful. There's been a few discussions, the other most recent one I remember was the thread over at https://lore.kernel.org/git/Y+zzh80fybq8Tn66@coredump.intra.peff.net/. (And beyond the git community, there's https://lkml.org/lkml/2023/5/7/206 in the kernel community, and if others know of discussions in other large developer communities I'd be interested in links.) The previous discussions felt to me like we were moving towards consensus, but while I found that encouraging since I think histogram would eventually be a better default, I did not make any actual proposals and try to push further towards consensus because there are a couple known issues that I think should be fixed before we consider flipping the default. I have some work-in-progress that was put on the backburner a few years ago that I would like to pick up again, and if successful, investigate how much that helps general cases in a format that can help people make educated decisions, and then again float the idea of changing the default. If consensus is reached, then we'd change the default across the board -- diff/log/merge-file/etc. rather than just the somewhat rarely used merge-file. At least, that's my current plan in this area; if others think I should investigate things in a different order or would like to see additional steps planned into this journey, please do let me know.
Thanks both. It sounds like there is a strong case for at least making "merge-file" honour the `diff.algorithm` config variable, possibly with more fine-grained config variables as suggested by Junio. I'll try to make a patch for that. Elijah, if any of your work on improving the histogram diff can be shared (for instance by writing down a description of the issues you'd like to address), I would be interested to have a look and perhaps help out if doable. But as far as I can tell there seems to be a lot of fans of the existing algorithm already, and such heuristics can never be perfect, so hopefully we can make the switch in a not too distant future. Best, Antonin
Elijah Newren <newren@gmail.com> writes: > The previous discussions felt to me like we were moving towards > consensus, but while I found that encouraging since I think histogram > would eventually be a better default, I did not make any actual > proposals and try to push further towards consensus because there are > a couple known issues that I think should be fixed before we consider > flipping the default. OK. > I have some work-in-progress that was put on > the backburner a few years ago that I would like to pick up again, and > if successful, investigate how much that helps general cases in a > format that can help people make educated decisions, and then again > float the idea of changing the default. Sounds like a good plan. > If consensus is reached, then > we'd change the default across the board -- diff/log/merge-file/etc. > rather than just the somewhat rarely used merge-file. Everybody uses diff.c::diff_algorithm somehow, will be affected by the diff.algorithm configuration variable, and everybody should honor "--diff-algorithm=<choice>" command line option to override? It sounds like a great plan. If ort for some reason performs better with histogram while format-patch works better with minimal, we could extend the configuration system to make diff.<cmd>.algorithm override the more generic diff.algorithm and do similarly silly things. We may want to also teach the diff plumbing commands to ignore diff.algorithm for better reproducibility but we may need to wait for Git 3.0 to do that. > At least, > that's my current plan in this area; if others think I should > investigate things in a different order or would like to see > additional steps planned into this journey, please do let me know. Thanks.
On 06/07/2024 08:06, Junio C Hamano wrote: > Everybody uses diff.c::diff_algorithm somehow, will be affected by > the diff.algorithm configuration variable, and everybody should > honor "--diff-algorithm=<choice>" command line option to override? I have looked into writing a patch to implement this, but it looks like the situation is quite messy. There are already half a dozen of commands which claim to honour the diff.algorithm configuration variable, but ignore it entirely. In the documentation of the recursive merge strategy (for instance in "man git-merge"), it is claimed that "recursive defaults to the diff.algorithm config setting". As far as I can tell, both from my reading of the code and my interactive testing, this is wrong. This affects the "merge", "rebase" and "pull" commands, which all three mention this configuration variable in their man page without respecting it. Ouch! I have looked for all commands which mention diff.algorithm in their man page and checked whether they indeed respect it. The "diff-index", "diff-tree" and "diff-files" commands also make this erroneous claim. The --diff-algorithm CLI option (as well as the --histogram and siblings) are respected, but not the diff.algorithm config variable. Those inconsistencies seem to be caused by the inclusion of the `diff-options.txt` file in the man pages, which leads their man pages to documenting a bunch of config variables which are in fact ignored. From a user perspective, I would say that those commands should indeed honour diff.algorithm, so I would be tempted to fix the code rather than the documentation. That being said, the man pages of those commands also mention other options which are in fact ignored, such as "diff.relative". I think it's likely that for some of those variables, the contrary is desirable: remove them from the docs because they indeed shouldn't be relied on by the command (because it does not make sense for that particular command). I don't have (yet) a good enough understanding of those commands and those options to judge this immediately, but it looks like there is some cleaning up to do. In diff.c, the code that is responsible for reading diff.* configuration variables for commands like "log", "diff" or "diff-index" divides diff.* variables into two categories: the ones that are "basic" (parsed by "git_diff_basic_config") and the ones only relevant to the "ui" (parsed by the "git_diff_ui_config" function). Surprisingly to me, the "diff.algorithm" variable belongs to the "ui" category, and is therefore skipped by commands such as "diff-index". It seems natural to me to move diff.algorithm to the "basic" section. However, that alone will not fix the problem (in my opinion much more serious) that the recursive merge strategy ignores the variable. To fix this, the parsing of this variable could either be added to "merge_recursive_config" (in merge-recursive.c), or in fact directly to "git_xmerge_config" (in xdiff-interface.c), which would then not only fix the recursive merge strategy, but also a range of other commands such as "rerere" or "merge-file". Given that I started looking into this with a specific interest in "merge-file", I would obviously be tempted to fix this bug directly at the xmerge level, and I think it would indeed make sense for other affected commands (surely "rerere" should benefit from using merge options that are consistent with the ones used for "git merge" or "git rebase", no?), but I can imagine that it's too bold a step and could have unwanted consequences I am not aware of - especially since Junio recommended to add support for diff.algorithm at the diff.c level. What do you think? In any case, I would of course make sure the "ort" strategy continues to ignore diff.algorithm for now, given its current default value. If you want to try this out for yourself, I have set up a test repository at https://git.kanthaus.online/antonin/testrepo. It has two branches "master" and "theirs", which you can try to merge/rebase, for instance "git merge -s recursive theirs" while being on master and having "diff.algorithm" set to "histogram". The merge scenario is designed to fail with a conflict if the myers algorithm is used, and succeed if histogram is used. You should be able to see that "git merge -s recursive theirs" will fail but "git merge -s recursive -Xdiff-algorithm=histogram theirs" should work. Best, Antonin
Antonin Delpeuch <antonin@delpeuch.eu> writes: > In the documentation of the recursive merge strategy (for instance in > "man git-merge"), it is claimed that "recursive defaults to the > diff.algorithm config setting". As far as I can tell, both from my > reading of the code and my interactive testing, this is wrong. This > affects the "merge", "rebase" and "pull" commands, which all three > mention this configuration variable in their man page without respecting > it. Ouch! Thanks for digging. > I have looked for all commands which mention diff.algorithm in their man > page and checked whether they indeed respect it. The "diff-index", > "diff-tree" and "diff-files" commands also make this erroneous claim. It is not erroneous if we say that these 3 diff plumbing commands ignore the configuration variable. They should ignore end-user configuration for reproducibility. See also my earlier response to Elijah <xmqqed873vgn.fsf@gitster.g> on a related topic. > The --diff-algorithm CLI option (as well as the --histogram and > siblings) are respected, but not the diff.algorithm config variable. Then they are behaving exactly as designed, which is good. We still need to correct their documentation, though. > Those inconsistencies seem to be caused by the inclusion of the > `diff-options.txt` file in the man pages, which leads their man pages to > documenting a bunch of config variables which are in fact ignored. That is quite understandable mistake ;-) "git merge" and other end-user facing commands should be taught to pay attention to both the command line option and the configuration variable. The plumbing commands should pay attention to the command line only. > In any case, I would of course make sure the "ort" strategy continues to > ignore diff.algorithm for now, given its current default value. It may make the effort easy to follow if you do this step-wise: (1) start with "all Porcelain commands pay attention to the same diff.algorithm variable. The plumbing commands ignore diff.algorithm. All commands, either Porcelain or plumbing, may have different default when unconfigured (like ort does)". (2) then add "each Porcelain command <cmd> pays attention to diff.<cmd>.algorithm if defined, otherwise diff.algorithm is used as a fallback default. There is no diff.<cmd>.algorithm for plumbing commands---they are designed not to be affected by the configuration variables". (3) optionally, doing "all Porcelain commands, when not configured, will use the same default (ort is no longer special---everybody falls back to algorithm X)" may be desiable for consistency and simplicity, but it would probably want further discussion can be left outside of the topic (e.g. right now the best candidate for X may be histogram, but is it suitable for all commands? should this extend to plumbing, making diff-index for example to use X as the default not myers when the command line does not specify --diff-algorithm?) Thanks.
diff --git a/builtin/merge-file.c b/builtin/merge-file.c index 1f987334a31..dce2676415e 100644 --- a/builtin/merge-file.c +++ b/builtin/merge-file.c @@ -29,6 +29,8 @@ static int label_cb(const struct option *opt, const char *arg, int unset) return 0; } +static int explicit_diff_algorithm = 0; + static int set_diff_algorithm(xpparam_t *xpp, const char *alg) { @@ -50,6 +52,8 @@ static int diff_algorithm_cb(const struct option *opt, return error(_("option diff-algorithm accepts \"myers\", " "\"minimal\", \"patience\" and \"histogram\"")); + explicit_diff_algorithm = 1; + return 0; } @@ -103,6 +107,10 @@ int cmd_merge_file(int argc, const char **argv, const char *prefix) return error_errno("failed to redirect stderr to /dev/null"); } + if (!explicit_diff_algorithm) { + warning(_("--diff-algorithm not provided, defaulting to \"myers\". This default will change to \"histogram\" in a future version.")); + } + if (object_id) setup_git_directory(); diff --git a/t/t6403-merge-file.sh b/t/t6403-merge-file.sh index fb872c5a113..9d0045be955 100755 --- a/t/t6403-merge-file.sh +++ b/t/t6403-merge-file.sh @@ -540,8 +540,9 @@ test_expect_success 'merging C files with "myers" diff algorithm creates some sp } EOF - test_must_fail git merge-file -p --diff3 --diff-algorithm myers ours.c base.c theirs.c >myers_output.c && - test_cmp expect.c myers_output.c + test_must_fail git merge-file -p --diff3 ours.c base.c theirs.c >myers_output.c 2> err && + test_cmp expect.c myers_output.c && + grep "diff-algorithm not provided" err ' test_expect_success 'merging C files with "histogram" diff algorithm avoids some spurious conflicts' '