[v2,2/2] tests: replace mingw_test_cmp with a helper in C

Message ID	1f5366f137967cbec30041b40eedd86ce5f6e953.1662469859.git.gitgitgadget@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> Message-Id: <1f5366f137967cbec30041b40eedd86ce5f6e953.1662469859.git.gitgitgadget@gmail.com> In-Reply-To: <pull.1309.v2.git.1662469859.gitgitgadget@gmail.com> References: <pull.1309.git.1659106382128.gitgitgadget@gmail.com> <pull.1309.v2.git.1662469859.gitgitgadget@gmail.com> Date: Tue, 06 Sep 2022 13:10:58 +0000 Subject: [PATCH v2 2/2] tests: replace mingw_test_cmp with a helper in C Fcc: Sent Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 To: git@vger.kernel.org Cc: Johannes Schindelin <johannes.schindelin@gmx.de>, Johannes Schindelin <johannes.schindelin@gmx.de> Precedence: bulk From: Johannes Schindelin <johannes.schindelin@gmx.de>
Series	tests: replace mingw_test_cmp with a helper in C \| expand [v2,0/2] tests: replace mingw_test_cmp with a helper in C [v2,1/2] t0021: use Windows-friendly `pwd` [v2,2/2] tests: replace mingw_test_cmp with a helper in C

Message ID

1f5366f137967cbec30041b40eedd86ce5f6e953.1662469859.git.gitgitgadget@gmail.com (mailing list archive)

State

New, archived

Headers

Message-Id: 
 <1f5366f137967cbec30041b40eedd86ce5f6e953.1662469859.git.gitgitgadget@gmail.com>
In-Reply-To: <pull.1309.v2.git.1662469859.gitgitgadget@gmail.com>
References: <pull.1309.git.1659106382128.gitgitgadget@gmail.com>
        <pull.1309.v2.git.1662469859.gitgitgadget@gmail.com>
Date: Tue, 06 Sep 2022 13:10:58 +0000
Subject: [PATCH v2 2/2] tests: replace mingw_test_cmp with a helper in C
Fcc: Sent
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
To: git@vger.kernel.org
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>,
        Johannes Schindelin <johannes.schindelin@gmx.de>
Precedence: bulk
From: Johannes Schindelin <johannes.schindelin@gmx.de>

Series

tests: replace mingw_test_cmp with a helper in C | expand

Commit Message

Johannes Schindelin Sept. 6, 2022, 1:10 p.m. UTC

From: Johannes Schindelin <johannes.schindelin@gmx.de>

This helper is more performant than running the `mingw_test_cmp` code
with MSYS2's Bash. And a lot more readable.

To accommodate t1050, which wants to compare files weighing in with 3MB
(falling outside of t1050's malloc limit of 1.5MB), we simply lift the
allocation limit by setting the environment variable GIT_ALLOC_LIMIT to
zero when calling the helper.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 Makefile                 |  1 +
 t/helper/test-text-cmp.c | 78 ++++++++++++++++++++++++++++++++++++++++
 t/helper/test-tool.c     |  1 +
 t/helper/test-tool.h     |  1 +
 t/test-lib-functions.sh  | 68 +----------------------------------
 t/test-lib.sh            |  2 +-
 6 files changed, 83 insertions(+), 68 deletions(-)
 create mode 100644 t/helper/test-text-cmp.c

Comments

Ævar Arnfjörð Bjarmason Sept. 7, 2022, 11:57 a.m. UTC | #1

On Tue, Sep 06 2022, Johannes Schindelin via GitGitGadget wrote:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> [...]
> +++ b/t/helper/test-text-cmp.c
> @@ -0,0 +1,78 @@
> +#include "test-tool.h"
> +#include "git-compat-util.h"
> +#include "strbuf.h"
> +#include "gettext.h"

Superflous header? Compiles without gettext.h for me (and we shouldn't
use i18n in test helpers).

> [...]
> +int cmd__text_cmp(int argc, const char **argv)
> +{
> +	FILE *f0, *f1;
> +	struct strbuf b0 = STRBUF_INIT, b1 = STRBUF_INIT;
> +
> +	if (argc != 3)
> +		die("Require exactly 2 arguments, got %d", argc);

Here you conflate the argc v.s. arguments minus the "text-cmp",
resulting in:

	helper/test-tool text-cmp 2
        fatal: Require exactly 2 arguments, got 2

An argc-- argv++ at the beginning seems like the easiest way out of
this. Also s/Require/require/ per CodingGuidelines.

> +	if (!strcmp(argv[1], "-") && !strcmp(argv[2], "-"))
> +		die("only one parameter can refer to `stdin` but not both");
> +
> +	if (!(f0 = !strcmp(argv[1], "-") ? stdin : fopen(argv[1], "r")))
> +		return error_errno("could not open '%s'", argv[1]);
> +	if (!(f1 = !strcmp(argv[2], "-") ? stdin : fopen(argv[2], "r"))) {
> +		fclose(f0);
> +		return error_errno("could not open '%s'", argv[2]);
> +	}

Faithfully emulating the old version. I do wonder if we couldn't simply
adjust the handful of tests that actually make use of the "-" diff(1)
feature. AFAICT there's around 10 of those at most, and they all seem
like cases where it would be easy to change:

	(echo foo) | test_cmp - actual

Or whatever, to:

	echo foo >expected &&
	test_cmp expected actual

...

> +			if (!strcmp(argv[1], "-") || !strcmp(argv[2], "-"))
> +				warning("cannot show diff because `stdin` was already consumed");

...

Which means we wouldn't need to punt on this.

> +			else if (!run_diff(argv[1], argv[2]))
> +				die("Huh? 'diff --no-index %s %s' succeeded",
> +				    argv[1], argv[2]);

I tried manually testing this with:

	GIT_TRACE=1 GIT_TEST_CMP="/home/avar/g/git/git diff --no-index --" ./t0021-conversion.sh  -vixd

v.s.:

	GIT_TRACE=1 GIT_TEST_CMP="$PWD/helper/test-tool text-cmp" ./t0021-conversion.sh  -vixd

Your version doesn't get confused by the same, but AFAICT this is by
fragile accident.

I.e. you run your own equivalent of "cmp", so because the files are the
same in that case we don't run the "diff --no-index".

But the "diff --no-index" in that t0021*.sh case *would* report
differences, even though the files are byte-for-byte identical.

So the "cmp"-a-like here isn't just an optimization to avoid forking the
"git diff" process, it's an entirely different comparison method in
cases where we have a "filter".

It just so happens that our test suite doesn't currently combine them in
a way that causes a current failure.

>  test_cmp () {
>  	test "$#" -ne 2 && BUG "2 param"
> -	eval "$GIT_TEST_CMP" '"$@"'
> +	GIT_ALLOC_LIMIT=0 eval "$GIT_TEST_CMP" '"$@"'
>  }

Further, we have a clear boundary in the test suite between "git" and
"test-tool" things we invoke, and third party tools. The former we put
in "test_must_fail_acceptable".

When using this new helper we'd hide potential segfaults and BUGs in any
"! test_cmp" invocation..

To avoid the introduction of such a blindspot we'd need to change
"test_cmp" to take an optional "!" as the 1st argument, and convert the
existing "! test_cmp" to "test_cmp !", then carry some flag to indicate
that our "GIT_TEST_CMP" is a git or test-tool invocation, and check it
appropriately.

> [...]
> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index 7726d1da88a..0be25ecbd59 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -1546,7 +1546,7 @@ case $uname_s in
>  	test_set_prereq SED_STRIPS_CR
>  	test_set_prereq GREP_STRIPS_CR
>  	test_set_prereq WINDOWS
> -	GIT_TEST_CMP=mingw_test_cmp
> +	GIT_TEST_CMP="test-tool text-cmp"
>  	;;
>  *CYGWIN*)
>  	test_set_prereq POSIXPERM

Not a new problem, but this is incompatible with
GIT_TEST_CMP_USE_COPIED_CONTEXT.

What is new though is that with this series there's no longer a good
reason AFAICT to carry GIT_TEST_CMP_USE_COPIED_CONTEXT at all. I.e. we
have it for a "diff" that doesn't understand "-u".

If (after getting past tho caveats noted above) we could simply invoke
our own test-tool we could drop that special-casing & just always invoke
our own test_cmp helper.

Ævar Arnfjörð Bjarmason Sept. 7, 2022, 12:24 p.m. UTC | #2

On Wed, Sep 07 2022, Ævar Arnfjörð Bjarmason wrote:

> On Tue, Sep 06 2022, Johannes Schindelin via GitGitGadget wrote:
>
>> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>> [...]
>> +++ b/t/helper/test-text-cmp.c
>> @@ -0,0 +1,78 @@
>> +#include "test-tool.h"
>> +#include "git-compat-util.h"
>> +#include "strbuf.h"
>> +#include "gettext.h"
>
> Superflous header? Compiles without gettext.h for me (and we shouldn't
> use i18n in test helpers).
>
>> [...]
>> +int cmd__text_cmp(int argc, const char **argv)
>> +{
>> +	FILE *f0, *f1;
>> +	struct strbuf b0 = STRBUF_INIT, b1 = STRBUF_INIT;
>> +
>> +	if (argc != 3)
>> +		die("Require exactly 2 arguments, got %d", argc);
>
> Here you conflate the argc v.s. arguments minus the "text-cmp",
> resulting in:
>
> 	helper/test-tool text-cmp 2
>         fatal: Require exactly 2 arguments, got 2
>
> An argc-- argv++ at the beginning seems like the easiest way out of
> this. Also s/Require/require/ per CodingGuidelines.
>
>> +	if (!strcmp(argv[1], "-") && !strcmp(argv[2], "-"))
>> +		die("only one parameter can refer to `stdin` but not both");
>> +
>> +	if (!(f0 = !strcmp(argv[1], "-") ? stdin : fopen(argv[1], "r")))
>> +		return error_errno("could not open '%s'", argv[1]);
>> +	if (!(f1 = !strcmp(argv[2], "-") ? stdin : fopen(argv[2], "r"))) {
>> +		fclose(f0);
>> +		return error_errno("could not open '%s'", argv[2]);
>> +	}
>
> Faithfully emulating the old version. I do wonder if we couldn't simply
> adjust the handful of tests that actually make use of the "-" diff(1)
> feature. AFAICT there's around 10 of those at most, and they all seem
> like cases where it would be easy to change:
>
> 	(echo foo) | test_cmp - actual
>
> Or whatever, to:
>
> 	echo foo >expected &&
> 	test_cmp expected actual
>
> ...
>
>> +			if (!strcmp(argv[1], "-") || !strcmp(argv[2], "-"))
>> +				warning("cannot show diff because `stdin` was already consumed");
>
> ...
>
> Which means we wouldn't need to punt on this.
>
>> +			else if (!run_diff(argv[1], argv[2]))
>> +				die("Huh? 'diff --no-index %s %s' succeeded",
>> +				    argv[1], argv[2]);
>
> I tried manually testing this with:
>
> 	GIT_TRACE=1 GIT_TEST_CMP="/home/avar/g/git/git diff --no-index --" ./t0021-conversion.sh  -vixd
>
> v.s.:
>
> 	GIT_TRACE=1 GIT_TEST_CMP="$PWD/helper/test-tool text-cmp" ./t0021-conversion.sh  -vixd
>
> Your version doesn't get confused by the same, but AFAICT this is by
> fragile accident.
>
> I.e. you run your own equivalent of "cmp", so because the files are the
> same in that case we don't run the "diff --no-index".
>
> But the "diff --no-index" in that t0021*.sh case *would* report
> differences, even though the files are byte-for-byte identical.
>
> So the "cmp"-a-like here isn't just an optimization to avoid forking the
> "git diff" process, it's an entirely different comparison method in
> cases where we have a "filter".
>
> It just so happens that our test suite doesn't currently combine them in
> a way that causes a current failure.

Ah (partially?) I spoke too soon on this part. I.e. the
GIT_DIR=/dev/null precludes reading the filter/repo in this case. So I
*think* we're out of the woods as far as this is concerned.

Still, it would be nice to document in a code comment or commit message
that the "not read the local repo's filter" is absolutely critical here.

But I think that re-raises the point René had in:
https://lore.kernel.org/git/b21d2b60-428f-58ec-28b6-3c617b9f2e45@web.de/

I ran the full test suite with:

	GIT_TEST_CMP='GIT_DIR=/dev/null HOME=/dev/null /usr/bin/git diff --no-index --ignore-cr-at-eol --'

And all of it passes, except for a test in t0001-init.sh which we could
fix up as:
	
	diff --git a/t/t0001-init.sh b/t/t0001-init.sh
	index d479303efa0..d65afe7cceb 100755
	--- a/t/t0001-init.sh
	+++ b/t/t0001-init.sh
	@@ -426,7 +426,7 @@ test_expect_success SYMLINKS 're-init to move gitdir symlink' '
	 	git init --separate-git-dir ../realgitdir
	 	) &&
	 	echo "gitdir: $(pwd)/realgitdir" >expected &&
	-	test_cmp expected newdir/.git &&
	+	test "$(test_readlink newdir/.git)" = here &&
	 	test_cmp expected newdir/here &&
	 	test_path_is_dir realgitdir/refs
	 '

Which without this series is more correct, as all we're re-testing there
is whether the symlink is pointing to what we expect. A hypothetical
"--dereference" to "git diff" would also take care of it (the equivalent
of "--no-dereference" being the default).

But with that all tests pass for me, so I'm puzzled as to the need for
the new helper, as opposed to just constructing the command above and
sticking it in GIT_TEST_CMP ...

Junio C Hamano Sept. 7, 2022, 7:45 p.m. UTC | #3

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> But I think that re-raises the point René had in:
> https://lore.kernel.org/git/b21d2b60-428f-58ec-28b6-3c617b9f2e45@web.de/

As the primary point of no-index mode was to expose fancy options
"git diff" has to comparisons of files outside version control,
without having to go through the trouble of upstreaming changes to
GNU diff, I do think "--ignore-cr-at-eol" should work fine with it,
and René's idea sounds like the best implementation for the
test-text-cmp helper command.

Thanks.

diff --git a/Makefile b/Makefile
index 1624471badc..73db55bba0f 100644
--- a/Makefile
+++ b/Makefile
@@ -786,6 +786,7 @@  TEST_BUILTINS_OBJS += test-string-list.o
 TEST_BUILTINS_OBJS += test-submodule-config.o
 TEST_BUILTINS_OBJS += test-submodule-nested-repo-config.o
 TEST_BUILTINS_OBJS += test-subprocess.o
+TEST_BUILTINS_OBJS += test-text-cmp.o
 TEST_BUILTINS_OBJS += test-trace2.o
 TEST_BUILTINS_OBJS += test-urlmatch-normalization.o
 TEST_BUILTINS_OBJS += test-userdiff.o
diff --git a/t/helper/test-text-cmp.c b/t/helper/test-text-cmp.c
new file mode 100644
index 00000000000..7c26d925086
--- /dev/null
+++ b/t/helper/test-text-cmp.c
@@ -0,0 +1,78 @@ 
+#include "test-tool.h"
+#include "git-compat-util.h"
+#include "strbuf.h"
+#include "gettext.h"
+#include "parse-options.h"
+#include "run-command.h"
+
+#ifdef WIN32
+#define NO_SUCH_DIR "\\\\.\\GLOBALROOT\\invalid"
+#else
+#define NO_SUCH_DIR "/dev/null"
+#endif
+
+static int run_diff(const char *path1, const char *path2)
+{
+	const char *argv[] = {
+		"diff", "--no-index", "--", NULL, NULL, NULL
+	};
+	const char *env[] = {
+		"GIT_PAGER=cat",
+		"GIT_DIR=" NO_SUCH_DIR,
+		"HOME=" NO_SUCH_DIR,
+		NULL
+	};
+
+	argv[3] = path1;
+	argv[4] = path2;
+	return run_command_v_opt_cd_env(argv,
+					RUN_COMMAND_NO_STDIN | RUN_GIT_CMD,
+					NULL, env);
+}
+
+int cmd__text_cmp(int argc, const char **argv)
+{
+	FILE *f0, *f1;
+	struct strbuf b0 = STRBUF_INIT, b1 = STRBUF_INIT;
+
+	if (argc != 3)
+		die("Require exactly 2 arguments, got %d", argc);
+
+	if (!strcmp(argv[1], "-") && !strcmp(argv[2], "-"))
+		die("only one parameter can refer to `stdin` but not both");
+
+	if (!(f0 = !strcmp(argv[1], "-") ? stdin : fopen(argv[1], "r")))
+		return error_errno("could not open '%s'", argv[1]);
+	if (!(f1 = !strcmp(argv[2], "-") ? stdin : fopen(argv[2], "r"))) {
+		fclose(f0);
+		return error_errno("could not open '%s'", argv[2]);
+	}
+
+	for (;;) {
+		int r0 = strbuf_getline(&b0, f0);
+		int r1 = strbuf_getline(&b1, f1);
+
+		if (r0 == EOF) {
+			fclose(f0);
+			fclose(f1);
+			strbuf_release(&b0);
+			strbuf_release(&b1);
+			if (r1 == EOF)
+				return 0;
+cmp_failed:
+			if (!strcmp(argv[1], "-") || !strcmp(argv[2], "-"))
+				warning("cannot show diff because `stdin` was already consumed");
+			else if (!run_diff(argv[1], argv[2]))
+				die("Huh? 'diff --no-index %s %s' succeeded",
+				    argv[1], argv[2]);
+			return 1;
+		}
+		if (r1 == EOF || strbuf_cmp(&b0, &b1)) {
+			fclose(f0);
+			fclose(f1);
+			strbuf_release(&b0);
+			strbuf_release(&b1);
+			goto cmp_failed;
+		}
+	}
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 318fdbab0c3..c6654ebc48b 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -81,6 +81,7 @@  static struct test_cmd cmds[] = {
 	{ "submodule-config", cmd__submodule_config },
 	{ "submodule-nested-repo-config", cmd__submodule_nested_repo_config },
 	{ "subprocess", cmd__subprocess },
+	{ "text-cmp", cmd__text_cmp },
 	{ "trace2", cmd__trace2 },
 	{ "userdiff", cmd__userdiff },
 	{ "urlmatch-normalization", cmd__urlmatch_normalization },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index bb799271631..2acfd2bcabc 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -71,6 +71,7 @@  int cmd__string_list(int argc, const char **argv);
 int cmd__submodule_config(int argc, const char **argv);
 int cmd__submodule_nested_repo_config(int argc, const char **argv);
 int cmd__subprocess(int argc, const char **argv);
+int cmd__text_cmp(int argc, const char **argv);
 int cmd__trace2(int argc, const char **argv);
 int cmd__userdiff(int argc, const char **argv);
 int cmd__urlmatch_normalization(int argc, const char **argv);
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 8c44856eaec..28eddbc8e36 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1240,7 +1240,7 @@  test_expect_code () {
 
 test_cmp () {
 	test "$#" -ne 2 && BUG "2 param"
-	eval "$GIT_TEST_CMP" '"$@"'
+	GIT_ALLOC_LIMIT=0 eval "$GIT_TEST_CMP" '"$@"'
 }
 
 # Check that the given config key has the expected value.
@@ -1541,72 +1541,6 @@  test_skip_or_die () {
 	error "$2"
 }
 
-# The following mingw_* functions obey POSIX shell syntax, but are actually
-# bash scripts, and are meant to be used only with bash on Windows.
-
-# A test_cmp function that treats LF and CRLF equal and avoids to fork
-# diff when possible.
-mingw_test_cmp () {
-	# Read text into shell variables and compare them. If the results
-	# are different, use regular diff to report the difference.
-	local test_cmp_a= test_cmp_b=
-
-	# When text came from stdin (one argument is '-') we must feed it
-	# to diff.
-	local stdin_for_diff=
-
-	# Since it is difficult to detect the difference between an
-	# empty input file and a failure to read the files, we go straight
-	# to diff if one of the inputs is empty.
-	if test -s "$1" && test -s "$2"
-	then
-		# regular case: both files non-empty
-		mingw_read_file_strip_cr_ test_cmp_a <"$1"
-		mingw_read_file_strip_cr_ test_cmp_b <"$2"
-	elif test -s "$1" && test "$2" = -
-	then
-		# read 2nd file from stdin
-		mingw_read_file_strip_cr_ test_cmp_a <"$1"
-		mingw_read_file_strip_cr_ test_cmp_b
-		stdin_for_diff='<<<"$test_cmp_b"'
-	elif test "$1" = - && test -s "$2"
-	then
-		# read 1st file from stdin
-		mingw_read_file_strip_cr_ test_cmp_a
-		mingw_read_file_strip_cr_ test_cmp_b <"$2"
-		stdin_for_diff='<<<"$test_cmp_a"'
-	fi
-	test -n "$test_cmp_a" &&
-	test -n "$test_cmp_b" &&
-	test "$test_cmp_a" = "$test_cmp_b" ||
-	eval "diff -u \"\$@\" $stdin_for_diff"
-}
-
-# $1 is the name of the shell variable to fill in
-mingw_read_file_strip_cr_ () {
-	# Read line-wise using LF as the line separator
-	# and use IFS to strip CR.
-	local line
-	while :
-	do
-		if IFS=$'\r' read -r -d $'\n' line
-		then
-			# good
-			line=$line$'\n'
-		else
-			# we get here at EOF, but also if the last line
-			# was not terminated by LF; in the latter case,
-			# some text was read
-			if test -z "$line"
-			then
-				# EOF, really
-				break
-			fi
-		fi
-		eval "$1=\$$1\$line"
-	done
-}
-
 # Like "env FOO=BAR some-program", but run inside a subshell, which means
 # it also works for shell functions (though those functions cannot impact
 # the environment outside of the test_env invocation).
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 7726d1da88a..0be25ecbd59 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1546,7 +1546,7 @@  case $uname_s in
 	test_set_prereq SED_STRIPS_CR
 	test_set_prereq GREP_STRIPS_CR
 	test_set_prereq WINDOWS
-	GIT_TEST_CMP=mingw_test_cmp
+	GIT_TEST_CMP="test-tool text-cmp"
 	;;
 *CYGWIN*)
 	test_set_prereq POSIXPERM

[v2,2/2] tests: replace mingw_test_cmp with a helper in C

Commit Message

Comments

Patch