diff mbox series

t4210: detect REG_ILLSEQ dynamically

Message ID 20200513111636.30818-1-carenas@gmail.com (mailing list archive)
State New, archived
Headers show
Series t4210: detect REG_ILLSEQ dynamically | expand

Commit Message

Carlo Marcelo Arenas Belón May 13, 2020, 11:16 a.m. UTC
7187c7bbb8 (t4210: skip i18n tests that don't work on FreeBSD, 2019-11-27)
adds a REG_ILLSEQ prerequisite to avoid failures from the tests added with
4e2443b181 (log tests: test regex backends in "--encode=<enc>" tests,
2019-06-28), but hardcodes it to be only enabled for FreeBSD.

Instead of hardcoding the affected platform, add a test using test-tool to
make sure it can be dynamically detected in other affected systems (like
DragonFlyBSD or macOS), and while at it, enhance the tool so it can report
better on the cause of failure and be silent when needed.

To minimize changes, it is assumed the compilation error is because of the
right type since we control the only caller, and is also assumed to affect
both basic and extended syntax (only extended is tested).  The description
of the first test which wasn't accurate has been also corrected, and the
suppressed tests restricted to the two affected engines only.

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
---
 t/helper/test-regex.c | 21 +++++++++++++++++----
 t/t4210-log-i18n.sh   | 24 ++++++++++++++++--------
 t/test-lib.sh         |  6 ------
 3 files changed, 33 insertions(+), 18 deletions(-)

Comments

Eric Sunshine May 13, 2020, 3:44 p.m. UTC | #1
On Wed, May 13, 2020 at 7:17 AM Carlo Marcelo Arenas Belón
<carenas@gmail.com> wrote:
> 7187c7bbb8 (t4210: skip i18n tests that don't work on FreeBSD, 2019-11-27)
> adds a REG_ILLSEQ prerequisite to avoid failures from the tests added with
> 4e2443b181 (log tests: test regex backends in "--encode=<enc>" tests,
> 2019-06-28), but hardcodes it to be only enabled for FreeBSD.
>
> Instead of hardcoding the affected platform, add a test using test-tool to
> make sure it can be dynamically detected in other affected systems (like
> DragonFlyBSD or macOS), and while at it, enhance the tool so it can report
> better on the cause of failure and be silent when needed.
> [...]
> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
> ---
> diff --git a/t/helper/test-regex.c b/t/helper/test-regex.c
> @@ -41,16 +41,21 @@ int cmd__regex(int argc, const char **argv)
>  {
> -       int flags = 0;
> +       int ret, opt_s = 0, flags = 0;
> [...]
> +       if (!strcmp(argv[1], "--silent")) {
> +               opt_s++;

Nit: When reading the declaration of 'opt_s', I found the name
inscrutable; it was only upon reading the actual code, that I
understood that it reflected whether --silent had been used. How about
giving it a more easily-understood name, such as 'silent'?

> diff --git a/t/t4210-log-i18n.sh b/t/t4210-log-i18n.sh
> @@ -56,21 +56,29 @@ test_expect_success !MINGW 'log --grep does not find non-reencoded values (latin
> +test_have_prereq GETTEXT_LOCALE &&
> +! LC_ALL=$is_IS_locale test-tool regex --silent $latin1_e $latin1_e EXTENDED &&
> +test_set_prereq REGEX_ILLSEQ

Nit: Is there precedent for formatting the code like this? At first
glance, I read these as three distinct statements rather than one
large composite statement. I wonder if indenting the continuation
lines would make this more easily-digested.

>  for engine in fixed basic extended perl
>  do
> +       ireq=
>         prereq=
> +       case $engine in
> +       basic|extended)
> +               ireq=!REGEX_ILLSEQ,
> +               ;;
> +       perl)
> +               prereq=PCRE
> +               ;;
> +       esac

Why do you introduce a new variable 'ireq' here considering that...

> +       test_expect_success !MINGW,GETTEXT_LOCALE,$ireq$prereq "-c grep.patternType=$engine log --grep searches in log output encoding (latin1 + locale)" "
> +       test_expect_success !MINGW,GETTEXT_LOCALE,$ireq$prereq "-c grep.patternType=$engine log --grep does not die on invalid UTF-8 value (latin1 + locale + invalid needle)" "

...it is always used alongside 'prereq'? It seems that you could just
assign "!REGEX_ILLSEQ" to 'prereq' without need for 'ireq'. (And
'ireq' is a rather inscrutable name -- I have trouble figuring out
what it means.)

>                 LC_ALL=\"$is_IS_locale\" git -c grep.patternType=$
> diff --git a/t/test-lib.sh b/t/test-lib.sh
> @@ -1454,12 +1454,6 @@ case $uname_s in
> -FreeBSD)
> -       test_set_prereq REGEX_ILLSEQ
> -       test_set_prereq POSIXPERM
> -       test_set_prereq BSLASHPSPEC
> -       test_set_prereq EXECKEEPSPID
> -       ;;

The commit message explains why you remove the 'REGEX_ILLSEQ', but why
are all the other lines removed, as well? Such removal seems unrelated
to the stated purpose of this patch.
Junio C Hamano May 13, 2020, 4:20 p.m. UTC | #2
Eric Sunshine <sunshine@sunshineco.com> writes:

>> diff --git a/t/helper/test-regex.c b/t/helper/test-regex.c
>> @@ -41,16 +41,21 @@ int cmd__regex(int argc, const char **argv)
>>  {
>> -       int flags = 0;
>> +       int ret, opt_s = 0, flags = 0;
>> [...]
>> +       if (!strcmp(argv[1], "--silent")) {
>> +               opt_s++;
>
> Nit: When reading the declaration of 'opt_s', I found the name
> inscrutable; it was only upon reading the actual code, that I
> understood that it reflected whether --silent had been used. How about
> giving it a more easily-understood name, such as 'silent'?

Yup, that is a reasonable fix to the readability problem.

>> diff --git a/t/t4210-log-i18n.sh b/t/t4210-log-i18n.sh
>> @@ -56,21 +56,29 @@ test_expect_success !MINGW 'log --grep does not find non-reencoded values (latin
>> +test_have_prereq GETTEXT_LOCALE &&
>> +! LC_ALL=$is_IS_locale test-tool regex --silent $latin1_e $latin1_e EXTENDED &&
>> +test_set_prereq REGEX_ILLSEQ
>
> Nit: Is there precedent for formatting the code like this? At first
> glance, I read these as three distinct statements rather than one
> large composite statement. I wonder if indenting the continuation
> lines would make this more easily-digested.

It's not like we are running three tests and expecting all of them
to succeed (or report a failure otherwise).  The first two are the
conditions we want to trigger the action the third one represents.

I'd prefer to see it written with "if then fi".  That would also be
a worthwhile readability fix.

>> -       test_set_prereq REGEX_ILLSEQ
>> -       test_set_prereq POSIXPERM
>> -       test_set_prereq BSLASHPSPEC
>> -       test_set_prereq EXECKEEPSPID
>> -       ;;
>
> The commit message explains why you remove the 'REGEX_ILLSEQ', but why
> are all the other lines removed, as well? Such removal seems unrelated
> to the stated purpose of this patch.

Yup, I was wondering about the same thing.

Thanks.
Carlo Marcelo Arenas Belón May 13, 2020, 8:18 p.m. UTC | #3
On Wed, May 13, 2020 at 11:44:53AM -0400, Eric Sunshine wrote:
> On Wed, May 13, 2020 at 7:17 AM Carlo Marcelo Arenas Belón
> <carenas@gmail.com> wrote:
> > diff --git a/t/t4210-log-i18n.sh b/t/t4210-log-i18n.sh
> > @@ -56,21 +56,29 @@ test_expect_success !MINGW 'log --grep does not find non-reencoded values (latin
> > +test_have_prereq GETTEXT_LOCALE &&
> > +! LC_ALL=$is_IS_locale test-tool regex --silent $latin1_e $latin1_e EXTENDED &&
> > +test_set_prereq REGEX_ILLSEQ
> 
> Nit: Is there precedent for formatting the code like this? At first
> glance, I read these as three distinct statements rather than one
> large composite statement. I wonder if indenting the continuation
> lines would make this more easily-digested.

yes, I copied the syntax from t7812, and I agree looks ugly and would had
rather done it with an if as Junio suggested, but couldn't find precedent
in another tests.

indeed, I would rather go away with the whole prereq and set a variable
with a nice sounding name and use it below to `if test` the right tests,
would that be ok?

> >  for engine in fixed basic extended perl
> >  do
> > +       ireq=
> >         prereq=
> > +       case $engine in
> > +       basic|extended)
> > +               ireq=!REGEX_ILLSEQ,
> > +               ;;
> > +       perl)
> > +               prereq=PCRE
> > +               ;;
> > +       esac
> 
> Why do you introduce a new variable 'ireq' here considering that...
> 
> > +       test_expect_success !MINGW,GETTEXT_LOCALE,$ireq$prereq "-c grep.patternType=$engine log --grep searches in log output encoding (latin1 + locale)" "
> > +       test_expect_success !MINGW,GETTEXT_LOCALE,$ireq$prereq "-c grep.patternType=$engine log --grep does not die on invalid UTF-8 value (latin1 + locale + invalid needle)" "
> 
> ...it is always used alongside 'prereq'? It seems that you could just
> assign "!REGEX_ILLSEQ" to 'prereq' without need for 'ireq'. (And
> 'ireq' is a rather inscrutable name -- I have trouble figuring out
> what it means.)

sadly I can't because there are 3 tests, and only 2 of them (the ones shown
in the patch) will have that prerequisite dynamically added, while all
will have $prereq.

will send a v2 as an RFC with your suggestions

> >                 LC_ALL=\"$is_IS_locale\" git -c grep.patternType=$
> > diff --git a/t/test-lib.sh b/t/test-lib.sh
> > @@ -1454,12 +1454,6 @@ case $uname_s in
> > -FreeBSD)
> > -       test_set_prereq REGEX_ILLSEQ
> > -       test_set_prereq POSIXPERM
> > -       test_set_prereq BSLASHPSPEC
> > -       test_set_prereq EXECKEEPSPID
> > -       ;;
> 
> The commit message explains why you remove the 'REGEX_ILLSEQ', but why
> are all the other lines removed, as well? Such removal seems unrelated
> to the stated purpose of this patch.

they were all added by the previous fix I am ammending and therefore are
no longer needed.

the 3 unrelated variables were originally copied from the '*' entry on
that case, and therefore FreeBSD will now use the ones there instead of
its special case.

Carlo
Junio C Hamano May 13, 2020, 8:37 p.m. UTC | #4
Carlo Marcelo Arenas Belón <carenas@gmail.com> writes:

> indeed, I would rather go away with the whole prereq and set a variable
> with a nice sounding name and use it below to `if test` the right tests,
> would that be ok?

In a sense, a prerequisite is an overkill if the tests that need to
be guarded are very well isolated and in a single place.  Do we
expect other tests outside the context of this script may have to be
guarded by REG_ILLSEQ?
Carlo Marcelo Arenas Belón May 13, 2020, 9:04 p.m. UTC | #5
On Wed, May 13, 2020 at 01:37:32PM -0700, Junio C Hamano wrote:
> Carlo Marcelo Arenas Belón <carenas@gmail.com> writes:
> 
> > indeed, I would rather go away with the whole prereq and set a variable
> > with a nice sounding name and use it below to `if test` the right tests,
> > would that be ok?
> 
> In a sense, a prerequisite is an overkill if the tests that need to
> be guarded are very well isolated and in a single place.  Do we
> expect other tests outside the context of this script may have to be
> guarded by REG_ILLSEQ?

IMHO in the long term we probably should get rid of those tests (including
the need to look for REG_ILLSEQ), because the root cause of this failure is
that the test is really relying in undefined behaviour which happen to be
common to both glibc and our own compat regex, but that was relevant at that
time, because of a bug on the way UTF-8 was being detected.

POSIX regcomp[1], specifically mentions that returned and error is a valid
response to a badly encoded pattern, but the test cases were created to
specifically ensure no error is ever generated, and to make sure the response
(which is undefined, as per POSIX) was consistent for all engines:

"The regcomp() and regexec() functions are required to accept any null-terminated string as the pattern argument. If the meaning of the string is "undefined", the behavior of the function is "unspecified". IEEE Std 1003.1-2001 does not specify how the functions will interpret the pattern; they might return error codes, or they might do pattern matching in some completely unexpected way, but they should not do something like abort the process."

Carlo

[1] https://pubs.opengroup.org/onlinepubs/009695399/functions/regcomp.html
diff mbox series

Patch

diff --git a/t/helper/test-regex.c b/t/helper/test-regex.c
index 10284cc56f..985fcb0f75 100644
--- a/t/helper/test-regex.c
+++ b/t/helper/test-regex.c
@@ -41,16 +41,21 @@  int cmd__regex(int argc, const char **argv)
 {
 	const char *pat;
 	const char *str;
-	int flags = 0;
+	int ret, opt_s = 0, flags = 0;
 	regex_t r;
 	regmatch_t m[1];
+	char errbuf[64];
 
 	if (argc == 2 && !strcmp(argv[1], "--bug"))
 		return test_regex_bug();
 	else if (argc < 3)
 		usage("test-tool regex --bug\n"
-		      "test-tool regex <pattern> <string> [<options>]");
+		      "test-tool regex [--silent] <pattern> <string> [<options>]");
 
+	if (!strcmp(argv[1], "--silent")) {
+		opt_s++;
+		argv++;
+	}
 	argv++;
 	pat = *argv++;
 	str = *argv++;
@@ -67,8 +72,16 @@  int cmd__regex(int argc, const char **argv)
 	}
 	git_setup_gettext();
 
-	if (regcomp(&r, pat, flags))
-		die("failed regcomp() for pattern '%s'", pat);
+	ret = regcomp(&r, pat, flags);
+	if (ret) {
+		if (opt_s)
+			return 1;
+		else {
+			regerror(ret, &r, errbuf, sizeof(errbuf));
+			die("failed regcomp() for pattern '%s' (%s)",
+				pat, errbuf);
+		}
+	}
 	if (regexec(&r, str, 1, m, 0))
 		return 1;
 
diff --git a/t/t4210-log-i18n.sh b/t/t4210-log-i18n.sh
index c3792081e6..24b9e7e72b 100755
--- a/t/t4210-log-i18n.sh
+++ b/t/t4210-log-i18n.sh
@@ -56,21 +56,29 @@  test_expect_success !MINGW 'log --grep does not find non-reencoded values (latin
 	test_must_be_empty actual
 '
 
+test_have_prereq GETTEXT_LOCALE &&
+! LC_ALL=$is_IS_locale test-tool regex --silent $latin1_e $latin1_e EXTENDED &&
+test_set_prereq REGEX_ILLSEQ
+
 for engine in fixed basic extended perl
 do
+	ireq=
 	prereq=
-	if test $engine = "perl"
-	then
-		prereq="PCRE"
-	else
-		prereq=""
-	fi
+	case $engine in
+	basic|extended)
+		ireq=!REGEX_ILLSEQ,
+		;;
+	perl)
+		prereq=PCRE
+		;;
+	esac
 	force_regex=
 	if test $engine != "fixed"
 	then
 	    force_regex=.*
 	fi
-	test_expect_success !MINGW,!REGEX_ILLSEQ,GETTEXT_LOCALE,$prereq "-c grep.patternType=$engine log --grep does not find non-reencoded values (latin1 + locale)" "
+
+	test_expect_success !MINGW,GETTEXT_LOCALE,$ireq$prereq "-c grep.patternType=$engine log --grep searches in log output encoding (latin1 + locale)" "
 		cat >expect <<-\EOF &&
 		latin1
 		utf8
@@ -84,7 +92,7 @@  do
 		test_must_be_empty actual
 	"
 
-	test_expect_success !MINGW,!REGEX_ILLSEQ,GETTEXT_LOCALE,$prereq "-c grep.patternType=$engine log --grep does not die on invalid UTF-8 value (latin1 + locale + invalid needle)" "
+	test_expect_success !MINGW,GETTEXT_LOCALE,$ireq$prereq "-c grep.patternType=$engine log --grep does not die on invalid UTF-8 value (latin1 + locale + invalid needle)" "
 		LC_ALL=\"$is_IS_locale\" git -c grep.patternType=$engine log --encoding=ISO-8859-1 --format=%s --grep=\"$force_regex$invalid_e\" >actual &&
 		test_must_be_empty actual
 	"
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 0ea1e5a05e..81473fea1d 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1454,12 +1454,6 @@  case $uname_s in
 	test_set_prereq SED_STRIPS_CR
 	test_set_prereq GREP_STRIPS_CR
 	;;
-FreeBSD)
-	test_set_prereq REGEX_ILLSEQ
-	test_set_prereq POSIXPERM
-	test_set_prereq BSLASHPSPEC
-	test_set_prereq EXECKEEPSPID
-	;;
 *)
 	test_set_prereq POSIXPERM
 	test_set_prereq BSLASHPSPEC