diff mbox series

Re* [PATCH v2] fixup! mergetool: add automerge configuration

Message ID xmqqa6thcn1n.fsf_-_@gitster.c.googlers.com (mailing list archive)
State New, archived
Headers show
Series Re* [PATCH v2] fixup! mergetool: add automerge configuration | expand

Commit Message

Junio C Hamano Jan. 10, 2021, 6:40 a.m. UTC
Junio C Hamano <gitster@pobox.com> writes:

> Seth House <seth@eseth.com> writes:
>
>> On Sat, Jan 09, 2021 at 02:42:36PM -0800, David Aguilar wrote:
>>> Replace "\r" with a substituted variable that contains "\r".
>>> Replace "\?" with "\{0,1\}".
>>
>> Nice. I was (very slowly) converging on that as the culprit. Thanks for
>> the elegant fix! I'm running the test suite on Windows and OSX (now that
>> they're set up locally) with this included and I'll send a v10 once
>> complete.
>
> Note that the topic fails t7800.5 even with the fix-up (and without
> these fix-up on a platform with sed that do not need the portability
> fix-up).

It seems that 51b27370 (mergetool: break setup_tool out into
separate initialization function, 2020-12-28) is the culprit.

git-difftool--helper is taught to do this:

    diff --git a/git-difftool--helper.sh b/git-difftool--helper.sh
    index 46af3e60b7..234dd6944e 100755
    --- a/git-difftool--helper.sh
    +++ b/git-difftool--helper.sh
     ...
    @@ -79,6 +80,7 @@ if test -n "$GIT_DIFFTOOL_DIRDIFF"
     then
            LOCAL="$1"
            REMOTE="$2"
    +	initialize_merge_tool "$merge_tool" &&
            run_merge_tool "$merge_tool" false
     else
            # Launch the merge tool on each path provided by 'git diff'

But the thing is, t7800.5 gives a name of merge-tool that does not
exist, and expects difftool to notice it and error out.  The callchain
starting from the above "run_merge_tool" should look like

    run_merge_tool () {
        merge_tool_path=$(get_merge_tool_path ...) || exit
	...

which in turn calls

    get_merge_tool_path () {
        # A merge tool has been set, so verify that it's valid.
        merge_tool="$1"
        if ! valid_tool "$merge_tool"
        then
                echo >&2 "Unknown merge tool $merge_tool"
                exit 1
        fi

and "valid_tool bad-tool" would return false, which lets us safely
exit with the error message.

But the call to initialize_merge_tool inserted above, that makes us
to SKIP the call to run_merge_tool, would defeat the error detection.

An ugly workaround patch that caters only to difftool breakage is
attached at the end; I did not look if a similar treatment is
necessary for the mergetool side.

By the way, debugging this was somewhat painful as difftool is partly
rewritten in C.  If it were still pure script, it would have been a
lot easier to diagnose with a single "set -x" somewhere X-<.

----- >8 ---------- >8 ---------- >8 ---------- >8 -----
From: Junio C Hamano <gitster@pobox.com>
Date: Sat, 9 Jan 2021 22:35:18 -0800
Subject: [PATCH] fixup! mergetool: break setup_tool out into
    separate initialization function

At least difftool wants to see even a broken tool name in its call
to run_merge_tool and have it diagnose an error.  &&-cascading the
call to initialize_merge_tool and run_merge_tool means that a bogus
tool name that does not please initialize_merge_tool is not even seen
by run_merge_tool and the error goes unnoticed.

I didn't audit the original commit for its use of initialize_merge_tool
on the merge-tool side.  It may share the same issue, or it may not.

This should fix the errors we've been seeing in t7800.5 (and
possibly others) with this topic.
---
 git-difftool--helper.sh | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Comments

Seth House Jan. 10, 2021, 7:29 a.m. UTC | #1
On Sat, Jan 09, 2021 at 10:40:20PM -0800, Junio C Hamano wrote:
> An ugly workaround patch that caters only to difftool breakage is
> attached at the end; I did not look if a similar treatment is
> necessary for the mergetool side.

That fixup does the trick on my machine too. Thank you.

How wary are you of continuing with `initialize_merge_tool`? Do you see
a better approach to get `automerge_enabled `into scope? While it is
a nice feature to have, is it worth the risk vs. reward?
Junio C Hamano Jan. 10, 2021, 11:24 a.m. UTC | #2
Seth House <seth@eseth.com> writes:

> On Sat, Jan 09, 2021 at 10:40:20PM -0800, Junio C Hamano wrote:
>> An ugly workaround patch that caters only to difftool breakage is
>> attached at the end; I did not look if a similar treatment is
>> necessary for the mergetool side.
>
> That fixup does the trick on my machine too. Thank you.

Note that with t7800 fixed with the patch, non Windows jobs all seem
to pass, but t7610 seems to have problem(s) on Windows.

https://github.com/git/git/runs/1675932107?check_suite_focus=true#step:7:10373

> How wary are you of continuing with `initialize_merge_tool`? Do you see
> a better approach to get `automerge_enabled `into scope? While it is
> a nice feature to have, is it worth the risk vs. reward?

Hmph.  We need to have a way to define helper routines customized
for the tool, and in the codebase without this series, it is the job
for setup_tool.  It defines fallback implementations, allows tool
specific customizations.  

Your initialize_merge_tool is just a thin wrapper around setup_tool.
It calls setup_tool, and if the function exits with non-zero status,
returns with status==1 (and otherwise returns with status==0).  As I
expect all the callers of setup_tool or initialize_merge_tool would
either ignore the status or check if it succeeded (i.e. compare $?
against 0 and any non-zero values are treated equally), it does not
seem to do anything useful.

I think we may be able to get rid of initialize_merge_tool, but you
would need to call setup_tool in places initialize_merge_tool was
called in your patch, as you must have needed to make sure that the
tool specific customizations have been carried out before going
forward in these places.

So, no, I do not see a reason to be wary of initialize/setup.  

With or without the seemingly needless initialize wrapper, I think
calling setup before starting to do certain operations that need
tool specific customization is just necessary.  The same machanism
has been in use to give can_merge/can_diff to each tool and the way
it works ought to be fairly well understood.

It is a different story if it makes sense not to exit when you see
failure from initialize/setup, and instead _skip_ running helpers
like run_merge_tool.  It was just a mistake we all make every once
in a while (i.e. a bug), and I am reasonably sure that we will
introduce more of them but we will be capable of fixing all.

Thanks.
Seth House Jan. 16, 2021, 4:24 a.m. UTC | #3
On Sun, Jan 10, 2021 at 03:24:48AM -0800, Junio C Hamano wrote:
> Note that with t7800 fixed with the patch, non Windows jobs all seem
> to pass, but t7610 seems to have problem(s) on Windows.

The autocrlf test is breaking because the sed that ships with some mingw
versions (and also some minsys and cygwin versions) will *automatically*
remove carriage returns:

$ printf 'foo\r\nbar\r\n' | sed -e '/bar/d' | cat -A
foo$

$ printf 'foo\r\nbar\r\n' | sed -b -e '/bar/d' | cat -A
foo^M$

(Note: the -b flag above is just for comparison. We can't use it here.
It's not in POSIX and is not present in sed for busybox or OSX.)

I haven't found official docs that describe this behavior yet but
I found a few discussions about it across a few lists. E.g.:
https://cygwin.com/pipermail/cygwin/2017-June/233121.html

Suggestions welcome.

Obviously we could try to detect crlf's before the sed and then try to
re-add them back in after sed but that strikes me as error-prone.

I recall someone mentioning calling merge-file twice, once with --ours
and once with --theirs, to independently generate each "side" of the
conflict instead of using sed to split MERGED. Is that a viable
approach?
brian m. carlson Jan. 22, 2021, 2:50 a.m. UTC | #4
On 2021-01-16 at 04:24:54, Seth House wrote:
> On Sun, Jan 10, 2021 at 03:24:48AM -0800, Junio C Hamano wrote:
> > Note that with t7800 fixed with the patch, non Windows jobs all seem
> > to pass, but t7610 seems to have problem(s) on Windows.
> 
> The autocrlf test is breaking because the sed that ships with some mingw
> versions (and also some minsys and cygwin versions) will *automatically*
> remove carriage returns:
> 
> $ printf 'foo\r\nbar\r\n' | sed -e '/bar/d' | cat -A
> foo$
> 
> $ printf 'foo\r\nbar\r\n' | sed -b -e '/bar/d' | cat -A
> foo^M$
> 
> (Note: the -b flag above is just for comparison. We can't use it here.
> It's not in POSIX and is not present in sed for busybox or OSX.)

Can you report this as a bug?  This behavior isn't compliant with POSIX
and it makes it really hard for folks to write portable code if these
versions implement POSIX utilities in a nonstandard way.  As a
non-Windows user, I have no hope of writing code that works on Windows
if we can't rely on our standard utilities working properly.
Johannes Schindelin Jan. 22, 2021, 4:29 p.m. UTC | #5
Hi brian,

On Fri, 22 Jan 2021, brian m. carlson wrote:

> On 2021-01-16 at 04:24:54, Seth House wrote:
> > On Sun, Jan 10, 2021 at 03:24:48AM -0800, Junio C Hamano wrote:
> > > Note that with t7800 fixed with the patch, non Windows jobs all seem
> > > to pass, but t7610 seems to have problem(s) on Windows.
> >
> > The autocrlf test is breaking because the sed that ships with some mingw
> > versions (and also some minsys and cygwin versions) will *automatically*
> > remove carriage returns:
> >
> > $ printf 'foo\r\nbar\r\n' | sed -e '/bar/d' | cat -A
> > foo$
> >
> > $ printf 'foo\r\nbar\r\n' | sed -b -e '/bar/d' | cat -A
> > foo^M$
> >
> > (Note: the -b flag above is just for comparison. We can't use it here.
> > It's not in POSIX and is not present in sed for busybox or OSX.)
>
> Can you report this as a bug?  This behavior isn't compliant with POSIX
> and it makes it really hard for folks to write portable code if these
> versions implement POSIX utilities in a nonstandard way.  As a
> non-Windows user, I have no hope of writing code that works on Windows
> if we can't rely on our standard utilities working properly.

I fear that the Windows-based tools do the correct thing, though: they are
meant to process _text_, and newlines are supposed to be
platform-dependent in text.

From that perspective, it sounds to me as if we're trying to ask `sed` to
do something it was not designed to do: binary editing.

Ciao,
Dscho
brian m. carlson Jan. 22, 2021, 11:25 p.m. UTC | #6
On 2021-01-22 at 16:29:46, Johannes Schindelin wrote:
> Hi brian,
> 
> On Fri, 22 Jan 2021, brian m. carlson wrote:
> 
> > On 2021-01-16 at 04:24:54, Seth House wrote:
> > > The autocrlf test is breaking because the sed that ships with some mingw
> > > versions (and also some minsys and cygwin versions) will *automatically*
> > > remove carriage returns:
> > >
> > > $ printf 'foo\r\nbar\r\n' | sed -e '/bar/d' | cat -A
> > > foo$
> > >
> > > $ printf 'foo\r\nbar\r\n' | sed -b -e '/bar/d' | cat -A
> > > foo^M$
> > >
> > > (Note: the -b flag above is just for comparison. We can't use it here.
> > > It's not in POSIX and is not present in sed for busybox or OSX.)
> >
> > Can you report this as a bug?  This behavior isn't compliant with POSIX
> > and it makes it really hard for folks to write portable code if these
> > versions implement POSIX utilities in a nonstandard way.  As a
> > non-Windows user, I have no hope of writing code that works on Windows
> > if we can't rely on our standard utilities working properly.
> 
> I fear that the Windows-based tools do the correct thing, though: they are
> meant to process _text_, and newlines are supposed to be
> platform-dependent in text.

Ah, but POSIX gives a very specific meaning to "newline", and it refers
to a single byte.  If you want tools that process CRLF line endings like
that, then that should be opt-in as either different tools or additional
options, not the default behavior of a POSIX tool.  This behavior is not
conforming to POSIX and it is therefore a defect.

> From that perspective, it sounds to me as if we're trying to ask `sed` to
> do something it was not designed to do: binary editing.

Most Windows tools are perfectly capable of handling LF line endings.
Even the famously incapable Notepad can now handle LF without CR.  With
the advent of WSL, handling LF line endings is now pretty much required.
Johannes Schindelin Jan. 26, 2021, 2:32 p.m. UTC | #7
Hi brian,

On Fri, 22 Jan 2021, brian m. carlson wrote:

> On 2021-01-22 at 16:29:46, Johannes Schindelin wrote:
> >
> > On Fri, 22 Jan 2021, brian m. carlson wrote:
> >
> > > On 2021-01-16 at 04:24:54, Seth House wrote:
> > > > The autocrlf test is breaking because the sed that ships with some mingw
> > > > versions (and also some minsys and cygwin versions) will *automatically*
> > > > remove carriage returns:
> > > >
> > > > $ printf 'foo\r\nbar\r\n' | sed -e '/bar/d' | cat -A
> > > > foo$
> > > >
> > > > $ printf 'foo\r\nbar\r\n' | sed -b -e '/bar/d' | cat -A
> > > > foo^M$
> > > >
> > > > (Note: the -b flag above is just for comparison. We can't use it here.
> > > > It's not in POSIX and is not present in sed for busybox or OSX.)
> > >
> > > Can you report this as a bug?  This behavior isn't compliant with POSIX
> > > and it makes it really hard for folks to write portable code if these
> > > versions implement POSIX utilities in a nonstandard way.  As a
> > > non-Windows user, I have no hope of writing code that works on Windows
> > > if we can't rely on our standard utilities working properly.
> >
> > I fear that the Windows-based tools do the correct thing, though: they are
> > meant to process _text_, and newlines are supposed to be
> > platform-dependent in text.
>
> Ah, but POSIX gives a very specific meaning to "newline", and it refers
> to a single byte.  If you want tools that process CRLF line endings like
> that, then that should be opt-in as either different tools or additional
> options, not the default behavior of a POSIX tool.  This behavior is not
> conforming to POSIX and it is therefore a defect.

If we needed another reminder that

	we never say "It's in POSIX; we'll happily ignore your needs
	should your system not conform to it."

(https://github.com/git/git/blob/v2.30.0/Documentation/CodingGuidelines),
then we have it right here ;-)

> > From that perspective, it sounds to me as if we're trying to ask `sed` to
> > do something it was not designed to do: binary editing.
>
> Most Windows tools are perfectly capable of handling LF line endings.
> Even the famously incapable Notepad can now handle LF without CR.  With
> the advent of WSL, handling LF line endings is now pretty much required.

I have two comments on that:

1. We could spend a splendid time questioning MSYS2's (and before that,
   MSys') choices regarding newlines, but I think we can spend that time a
   lot better.

2. Newer software _seems_ to handle LF line endings just fine. And the
   `sed` invocation you mentioned above does so: it groks input delimited
   by Line Feed as newline characters. That does not mean that its output
   is LF-only by default. I seem to remember that recent Visual Studio
   versions did something similar: happily read an LF-only `.xml` file,
   but then write out a modified version using CR/LF.

Would I wish that Windows used LF-only instead of CR/LF, just like macOS
switched away from CR-only to LF-only after MacOS 9? Sure I do. It would
remove quite a few obstacles in my daily work. Can I do anything about it?
No.

So I'd rather see `git mergetool` be turned into a portable C program, or
alternatively using a built-in helper that _is_ written in C, to perform
that desired text munging instead of losing the battle to the challenge of
cross-platform, advanced text processing in pure shell script.

Ciao,
Dscho
Seth House Jan. 26, 2021, 6:06 p.m. UTC | #8
On Tue, Jan 26, 2021 at 03:32:13PM +0100, Johannes Schindelin wrote:
> So I'd rather see `git mergetool` be turned into a portable C program, or
> alternatively using a built-in helper that _is_ written in C, to perform
> that desired text munging

I tend to agree. Though my personal preference is Cygwin's (eventual)
approach, I can appreciate the arguments made by the MSYS2 folk. But
setting that aside, IMO, the ideal place to handle this would be the
same place where the conflict markers are written in the first place,
xmerge.c if my limited C literacy is correct.

I don't see a big distinction between writing a single file with
conflict markers and writing two, diff-able files with each "side" of
the conflict -- they're ultimately two different formats for expressing
the same information. That would give us the portability you described
and the (pretty amazing) performance that merge-file already enjoys. :)

I'm more than happy with calling merge-file twice for now. A future
C optimisation, perhaps exposed via merge-file as a new (e.g.)
--write-conflict-files flag, would be even more awesome.
Junio C Hamano Jan. 26, 2021, 8:10 p.m. UTC | #9
Seth House <seth@eseth.com> writes:

> On Tue, Jan 26, 2021 at 03:32:13PM +0100, Johannes Schindelin wrote:
>> So I'd rather see `git mergetool` be turned into a portable C program, or
>> alternatively using a built-in helper that _is_ written in C, to perform
>> that desired text munging
>
> I tend to agree. Though my personal preference is Cygwin's (eventual)
> approach, I can appreciate the arguments made by the MSYS2 folk. But
> setting that aside, IMO, the ideal place to handle this would be the
> same place where the conflict markers are written in the first place,
> xmerge.c if my limited C literacy is correct.
>
> I don't see a big distinction between writing a single file with
> conflict markers and writing two, diff-able files with each "side" of
> the conflict -- they're ultimately two different formats for expressing
> the same information. That would give us the portability you described
> and the (pretty amazing) performance that merge-file already enjoys. :)
>
> I'm more than happy with calling merge-file twice for now. A future
> C optimisation, perhaps exposed via merge-file as a new (e.g.)
> --write-conflict-files flag, would be even more awesome.

I am OK with that "two merge-file invocations, one with --ours and
then another with --theirs" approach, as I already said in
https://lore.kernel.org/git/xmqqh7n9aer5.fsf@gitster.c.googlers.com/
Seth House Jan. 27, 2021, 3:37 a.m. UTC | #10
On Tue, Jan 26, 2021 at 12:10:17PM -0800, Junio C Hamano wrote:
> I am OK with that "two merge-file invocations, one with --ours and
> then another with --theirs" approach, as I already said in
> https://lore.kernel.org/git/xmqqh7n9aer5.fsf@gitster.c.googlers.com/

Sorry if it seemed like I left that thread hanging (busy week). Thanks
for your reply(s). I'm finishing a v10 patchset with that change which
I'll submit to the list for review this week.
Junio C Hamano Jan. 29, 2021, 12:41 a.m. UTC | #11
Seth House <seth@eseth.com> writes:

> Sorry if it seemed like I left that thread hanging (busy week). Thanks
> for your reply(s). I'm finishing a v10 patchset with that change which
> I'll submit to the list for review this week.

Thanks.
diff mbox series

Patch

diff --git a/git-difftool--helper.sh b/git-difftool--helper.sh
index 234dd6944e..992124cc67 100755
--- a/git-difftool--helper.sh
+++ b/git-difftool--helper.sh
@@ -61,7 +61,9 @@  launch_merge_tool () {
 		export BASE
 		eval $GIT_DIFFTOOL_EXTCMD '"$LOCAL"' '"$REMOTE"'
 	else
-		initialize_merge_tool "$merge_tool" &&
+		initialize_merge_tool "$merge_tool"
+		# ignore the error from the above --- run_merge_tool
+		# will diagnose unusable tool by itself
 		run_merge_tool "$merge_tool"
 	fi
 }
@@ -80,7 +82,9 @@  if test -n "$GIT_DIFFTOOL_DIRDIFF"
 then
 	LOCAL="$1"
 	REMOTE="$2"
-	initialize_merge_tool "$merge_tool" &&
+	initialize_merge_tool "$merge_tool"
+	# ignore the error from the above --- run_merge_tool
+	# will diagnose unusable tool by itself
 	run_merge_tool "$merge_tool" false
 else
 	# Launch the merge tool on each path provided by 'git diff'