[v2] t4210: detect REG_ILLSEQ dynamically
diff mbox series

Message ID 20200515195157.41217-1-carenas@gmail.com
State New
Headers show
Series
  • [v2] t4210: detect REG_ILLSEQ dynamically
Related show

Commit Message

Carlo Marcelo Arenas Belón May 15, 2020, 7:51 p.m. UTC
7187c7bbb8 (t4210: skip i18n tests that don't work on FreeBSD, 2019-11-27)
adds a REG_ILLSEQ prerequisite to avoid failures from the tests added with
4e2443b181 (log tests: test regex backends in "--encode=<enc>" tests,
2019-06-28), but hardcodes it to be only enabled for FreeBSD.

Instead of hardcoding the affected platform, add a test using test-tool to
make sure it can be dynamically detected in other affected systems (like
DragonFlyBSD or macOS), and while at it, enhance the tool so it can report
better on the cause of failure and be silent when needed.

To minimize changes, it is assumed the regcomp error is of the right type
since we control the only caller, and is also assumed to affect both basic
and extended syntax (only extended is tested, but both behave the same in
all three affected platforms).

The description of the first test which wasn't accurate has been corrected,
and unlike the original fix from 7187c7bbb8, that created a special case
for FreeBSD by adding REG_ILLSEQ to the common set in test-lib, this does
not need changes to test-lib and therefore had those reverted.

Only the affected engines will have their tests suppressed, which include
"fixed" if the PCRE optimization that uses LIBPCRE2 since b65abcafc7
(grep: use PCRE v2 for optimized fixed-string search, 2019-07-01) is not
available.

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
v2:
- address all suggestions from Eric
- reorder tests and add a helper functions (probably overly verbose) to
  clarify when the tests will be skipped (no prereq is used)

 t/helper/test-regex.c | 19 +++++++++---
 t/t4210-log-i18n.sh   | 70 ++++++++++++++++++++++++++++++++-----------
 t/test-lib.sh         |  6 ----
 3 files changed, 68 insertions(+), 27 deletions(-)

Comments

Junio C Hamano May 15, 2020, 8:24 p.m. UTC | #1
Carlo Marcelo Arenas Belón  <carenas@gmail.com> writes:

> diff --git a/t/helper/test-regex.c b/t/helper/test-regex.c
> index 10284cc56f..7a8ddce45b 100644
> --- a/t/helper/test-regex.c
> +++ b/t/helper/test-regex.c
> @@ -41,16 +41,21 @@ int cmd__regex(int argc, const char **argv)
>  {
>  	const char *pat;
>  	const char *str;
> -	int flags = 0;
> +	int ret, silent = 0, flags = 0;
>  	regex_t r;
>  	regmatch_t m[1];
> +	char errbuf[64];
>  
>  	if (argc == 2 && !strcmp(argv[1], "--bug"))
>  		return test_regex_bug();
>  	else if (argc < 3)
>  		usage("test-tool regex --bug\n"
> -		      "test-tool regex <pattern> <string> [<options>]");
> +		      "test-tool regex [--silent] <pattern> <string> [<options>]");
>
> +	if (!strcmp(argv[1], "--silent")) {
> +		silent = 1;
> +		argv++;
> +	}

This looks fishy---if argc==3 and the first one is "--silent", only
the <pattern> is left in argv and before taking <string> out of the
argv, we need to ensure argc is still large enough, but I do not
think that is done below:

>  	argv++;
>  	pat = *argv++;
>  	str = *argv++;

So str here would be NULL and/or *argv++ would have given you an
out-of-bounds access already.

> @@ -67,8 +72,14 @@ int cmd__regex(int argc, const char **argv)
>  	}
>  	git_setup_gettext();
>  
> -	if (regcomp(&r, pat, flags))
> -		die("failed regcomp() for pattern '%s'", pat);
> +	ret = regcomp(&r, pat, flags);
> +	if (ret) {
> +		if (silent)
> +			return 1;
> +
> +		regerror(ret, &r, errbuf, sizeof(errbuf));
> +		die("failed regcomp() for pattern '%s' (%s)", pat, errbuf);
> +	}
>  	if (regexec(&r, str, 1, m, 0))
>  		return 1;

Not that it matters _too_ much as this is merely a test helper and
it would not hurt anybody as long as our callers are careful.

> diff --git a/t/t4210-log-i18n.sh b/t/t4210-log-i18n.sh
> index c3792081e6..a89f456817 100755
> --- a/t/t4210-log-i18n.sh
> +++ b/t/t4210-log-i18n.sh
> @@ -10,6 +10,12 @@ latin1_e=$(printf '\351')
>  # invalid UTF-8
>  invalid_e=$(printf '\303\50)') # ")" at end to close opening "("
>  
> +if test_have_prereq GETTEXT_LOCALE &&
> +	! LC_ALL=$is_IS_locale test-tool regex --silent $latin1_e $latin1_e EXTENDED
> +then
> +	have_reg_illseq=1
> +fi

OK.  Have we cleared have_reg_illseq shell variable before we reach
this point?  If not, we should (think: environment variable end user
had before starting the test).

> @@ -56,38 +62,68 @@ test_expect_success !MINGW 'log --grep does
>  	test_must_be_empty actual
>  '
>  
> +trigger_undefined_behaviour()
> +{

Style:

	triggers_undefined_behaviour () {

My first two readings of this patch mistakenly told me that the name
of the function was an instruction to the test to trigger an
undefined behaviour to see what happens, but this helper answers a
question "does the given engine trigger an undefined behaviour (with
the test data we are going to throw at it)?", right?  Perhaps rename
the helper to "triggerS_undefined_behaviour" would reduce the risk
of inviting such a misinterpretation.

> +	local engine=$1
> +
> +	case $engine in
> +	fixed)
> +		if test -n "$have_reg_illseq" &&
> +			! test_have_prereq LIBPCRE2
> +		then
> +			return 0
> +		else
> +			return 1
> +		fi
> +		;;
> +	basic|extended)
> +		if test -n "$have_reg_illseq"
> +		then
> +			return 0
> +		else
> +			return 1
> +		fi
> +		;;
> +	perl)
> +		return 1
> +		;;
> +	esac
> +}

... and the return value is true for "yes it would trigger undefined
behaviour" and false for "no it would not".

>  for engine in fixed basic extended perl
>  do
>  	prereq=
>  	if test $engine = "perl"
>  	then
> +		prereq=PCRE
>  	fi
>  	force_regex=
>  	if test $engine != "fixed"
>  	then
> +		force_regex='.*'
>  	fi
>  
>  	test_expect_success !MINGW,GETTEXT_LOCALE,$prereq "-c grep.patternType=$engine log --grep does not find non-reencoded values (latin1 + locale)" "
> +		LC_ALL=$is_IS_locale git -c grep.patternType=$engine log --encoding=ISO-8859-1 --format=%s --grep=\"$force_regex$utf8_e\" >actual &&

Can we do something to these overlong lines, by the way?

>  		test_must_be_empty actual
>  	"
>  
> +	if ! trigger_undefined_behaviour $engine
> +	then

Much easier to read than the ILLSEQ prerequisite, I would think,
even though the overlong lines are annoying.

> +		test_expect_success !MINGW,GETTEXT_LOCALE,$prereq "-c grep.patternType=$engine log --grep searches in log output encoding (latin1 + locale)" "
> +			cat >expect <<-\EOF &&
> +			latin1
> +			utf8
> +			EOF
> +			LC_ALL=$is_IS_locale git -c grep.patternType=$engine log --encoding=ISO-8859-1 --format=%s --grep=\"$force_regex$latin1_e\" >actual &&
> +			test_cmp expect actual
> +		"
> +
> +		test_expect_success !MINGW,GETTEXT_LOCALE,$prereq "-c grep.patternType=$engine log --grep does not die on invalid UTF-8 value (latin1 + locale + invalid needle)" "
> +			LC_ALL=$is_IS_locale git -c grep.patternType=$engine log --encoding=ISO-8859-1 --format=%s --grep=\"$force_regex$invalid_e\" >actual &&
> +			test_must_be_empty actual
> +		"
> +	fi
>  done
>  
>  test_done
> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index 0ea1e5a05e..81473fea1d 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -1454,12 +1454,6 @@ case $uname_s in
>  	test_set_prereq SED_STRIPS_CR
>  	test_set_prereq GREP_STRIPS_CR
>  	;;
> -FreeBSD)
> -	test_set_prereq REGEX_ILLSEQ
> -	test_set_prereq POSIXPERM
> -	test_set_prereq BSLASHPSPEC
> -	test_set_prereq EXECKEEPSPID
> -	;;
>  *)
>  	test_set_prereq POSIXPERM
>  	test_set_prereq BSLASHPSPEC

Nice to be able to drop one case arm from here.  Thanks.
Junio C Hamano May 15, 2020, 9:48 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

>>  	else if (argc < 3)
>>  		usage("test-tool regex --bug\n"
>> -		      "test-tool regex <pattern> <string> [<options>]");
>> +		      "test-tool regex [--silent] <pattern> <string> [<options>]");
>>
>> +	if (!strcmp(argv[1], "--silent")) {
>> +		silent = 1;
>> +		argv++;
>> +	}
>
> This looks fishy---if argc==3 and the first one is "--silent", only
> the <pattern> is left in argv and before taking <string> out of the
> argv, we need to ensure argc is still large enough, but I do not
> think that is done below:
> ...
> Not that it matters _too_ much as this is merely a test helper and
> it would not hurt anybody as long as our callers are careful.

But it still bothers me.  Perhaps like this?  

If I were writing this from scratch, I would probably increment
argv++ once as early as possible and consistently access argv[0]
or *argv++, but that would make the patch even noisier.

 t/helper/test-regex.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/t/helper/test-regex.c b/t/helper/test-regex.c
index 7a8ddce45b..e68b4f6a73 100644
--- a/t/helper/test-regex.c
+++ b/t/helper/test-regex.c
@@ -46,16 +46,23 @@ int cmd__regex(int argc, const char **argv)
 	regmatch_t m[1];
 	char errbuf[64];
 
-	if (argc == 2 && !strcmp(argv[1], "--bug"))
-		return test_regex_bug();
-	else if (argc < 3)
-		usage("test-tool regex --bug\n"
-		      "test-tool regex [--silent] <pattern> <string> [<options>]");
+	if (argc < 2)
+		goto usage;
 
+	if (!strcmp(argv[1], "--bug")) {
+		if (argc == 2)
+			return test_regex_bug();
+		else
+			goto usage;
+	}
 	if (!strcmp(argv[1], "--silent")) {
-		silent = 1;
+		silent = 0;
 		argv++;
+		argc--;
 	}
+	if (argc < 3)
+		goto usage;
+
 	argv++;
 	pat = *argv++;
 	str = *argv++;
@@ -84,4 +91,8 @@ int cmd__regex(int argc, const char **argv)
 		return 1;
 
 	return 0;
+
+usage:
+	usage("test-tool regex --bug\n"
+	      "test-tool regex [--silent] <pattern> <string> [<options>]");
 }

Patch
diff mbox series

diff --git a/t/helper/test-regex.c b/t/helper/test-regex.c
index 10284cc56f..7a8ddce45b 100644
--- a/t/helper/test-regex.c
+++ b/t/helper/test-regex.c
@@ -41,16 +41,21 @@  int cmd__regex(int argc, const char **argv)
 {
 	const char *pat;
 	const char *str;
-	int flags = 0;
+	int ret, silent = 0, flags = 0;
 	regex_t r;
 	regmatch_t m[1];
+	char errbuf[64];
 
 	if (argc == 2 && !strcmp(argv[1], "--bug"))
 		return test_regex_bug();
 	else if (argc < 3)
 		usage("test-tool regex --bug\n"
-		      "test-tool regex <pattern> <string> [<options>]");
+		      "test-tool regex [--silent] <pattern> <string> [<options>]");
 
+	if (!strcmp(argv[1], "--silent")) {
+		silent = 1;
+		argv++;
+	}
 	argv++;
 	pat = *argv++;
 	str = *argv++;
@@ -67,8 +72,14 @@  int cmd__regex(int argc, const char **argv)
 	}
 	git_setup_gettext();
 
-	if (regcomp(&r, pat, flags))
-		die("failed regcomp() for pattern '%s'", pat);
+	ret = regcomp(&r, pat, flags);
+	if (ret) {
+		if (silent)
+			return 1;
+
+		regerror(ret, &r, errbuf, sizeof(errbuf));
+		die("failed regcomp() for pattern '%s' (%s)", pat, errbuf);
+	}
 	if (regexec(&r, str, 1, m, 0))
 		return 1;
 
diff --git a/t/t4210-log-i18n.sh b/t/t4210-log-i18n.sh
index c3792081e6..a89f456817 100755
--- a/t/t4210-log-i18n.sh
+++ b/t/t4210-log-i18n.sh
@@ -10,6 +10,12 @@  latin1_e=$(printf '\351')
 # invalid UTF-8
 invalid_e=$(printf '\303\50)') # ")" at end to close opening "("
 
+if test_have_prereq GETTEXT_LOCALE &&
+	! LC_ALL=$is_IS_locale test-tool regex --silent $latin1_e $latin1_e EXTENDED
+then
+	have_reg_illseq=1
+fi
+
 test_expect_success 'create commits in different encodings' '
 	test_tick &&
 	cat >msg <<-EOF &&
@@ -56,38 +62,68 @@  test_expect_success !MINGW 'log --grep does not find non-reencoded values (latin
 	test_must_be_empty actual
 '
 
+trigger_undefined_behaviour()
+{
+	local engine=$1
+
+	case $engine in
+	fixed)
+		if test -n "$have_reg_illseq" &&
+			! test_have_prereq LIBPCRE2
+		then
+			return 0
+		else
+			return 1
+		fi
+		;;
+	basic|extended)
+		if test -n "$have_reg_illseq"
+		then
+			return 0
+		else
+			return 1
+		fi
+		;;
+	perl)
+		return 1
+		;;
+	esac
+}
+
 for engine in fixed basic extended perl
 do
 	prereq=
 	if test $engine = "perl"
 	then
-		prereq="PCRE"
-	else
-		prereq=""
+		prereq=PCRE
 	fi
 	force_regex=
 	if test $engine != "fixed"
 	then
-	    force_regex=.*
+		force_regex='.*'
 	fi
-	test_expect_success !MINGW,!REGEX_ILLSEQ,GETTEXT_LOCALE,$prereq "-c grep.patternType=$engine log --grep does not find non-reencoded values (latin1 + locale)" "
-		cat >expect <<-\EOF &&
-		latin1
-		utf8
-		EOF
-		LC_ALL=\"$is_IS_locale\" git -c grep.patternType=$engine log --encoding=ISO-8859-1 --format=%s --grep=\"$force_regex$latin1_e\" >actual &&
-		test_cmp expect actual
-	"
 
 	test_expect_success !MINGW,GETTEXT_LOCALE,$prereq "-c grep.patternType=$engine log --grep does not find non-reencoded values (latin1 + locale)" "
-		LC_ALL=\"$is_IS_locale\" git -c grep.patternType=$engine log --encoding=ISO-8859-1 --format=%s --grep=\"$force_regex$utf8_e\" >actual &&
+		LC_ALL=$is_IS_locale git -c grep.patternType=$engine log --encoding=ISO-8859-1 --format=%s --grep=\"$force_regex$utf8_e\" >actual &&
 		test_must_be_empty actual
 	"
 
-	test_expect_success !MINGW,!REGEX_ILLSEQ,GETTEXT_LOCALE,$prereq "-c grep.patternType=$engine log --grep does not die on invalid UTF-8 value (latin1 + locale + invalid needle)" "
-		LC_ALL=\"$is_IS_locale\" git -c grep.patternType=$engine log --encoding=ISO-8859-1 --format=%s --grep=\"$force_regex$invalid_e\" >actual &&
-		test_must_be_empty actual
-	"
+	if ! trigger_undefined_behaviour $engine
+	then
+		test_expect_success !MINGW,GETTEXT_LOCALE,$prereq "-c grep.patternType=$engine log --grep searches in log output encoding (latin1 + locale)" "
+			cat >expect <<-\EOF &&
+			latin1
+			utf8
+			EOF
+			LC_ALL=$is_IS_locale git -c grep.patternType=$engine log --encoding=ISO-8859-1 --format=%s --grep=\"$force_regex$latin1_e\" >actual &&
+			test_cmp expect actual
+		"
+
+		test_expect_success !MINGW,GETTEXT_LOCALE,$prereq "-c grep.patternType=$engine log --grep does not die on invalid UTF-8 value (latin1 + locale + invalid needle)" "
+			LC_ALL=$is_IS_locale git -c grep.patternType=$engine log --encoding=ISO-8859-1 --format=%s --grep=\"$force_regex$invalid_e\" >actual &&
+			test_must_be_empty actual
+		"
+	fi
 done
 
 test_done
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 0ea1e5a05e..81473fea1d 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1454,12 +1454,6 @@  case $uname_s in
 	test_set_prereq SED_STRIPS_CR
 	test_set_prereq GREP_STRIPS_CR
 	;;
-FreeBSD)
-	test_set_prereq REGEX_ILLSEQ
-	test_set_prereq POSIXPERM
-	test_set_prereq BSLASHPSPEC
-	test_set_prereq EXECKEEPSPID
-	;;
 *)
 	test_set_prereq POSIXPERM
 	test_set_prereq BSLASHPSPEC