diff mbox series

Port helper/test-ctype.c to unit-tests/t-ctype.c

Message ID 20231221231527.8130-1-ach.lumap@gmail.com (mailing list archive)
State Superseded
Headers show
Series Port helper/test-ctype.c to unit-tests/t-ctype.c | expand

Commit Message

Achu Luma Dec. 21, 2023, 11:15 p.m. UTC
In the recent codebase update (8bf6fbd00d (Merge branch
'js/doc-unit-tests', 2023-12-09)), a new unit testing framework was
merged, providing a standardized approach for testing C code. Prior to
this update, some unit tests relied on the test helper mechanism,
lacking a dedicated unit testing framework. It's more natural to perform
these unit tests using the new unit test framework.

This commit migrates the unit tests for C character classification
functions (isdigit(), isspace(), etc) from the legacy approach
using the test-tool command `test-tool ctype` in t/helper/test-ctype.c
to the new unit testing framework (t/unit-tests/test-lib.h).

The migration involves refactoring the tests to utilize the testing
macros provided by the framework (TEST() and check_*()).

Mentored-by: Christian Couder <chriscool@tuxfamily.org> 
Signed-off-by: Achu Luma <ach.lumap@gmail.com>
---
 Makefile               |  2 +-
 t/helper/test-ctype.c  | 70 --------------------------------------
 t/helper/test-tool.c   |  1 -
 t/helper/test-tool.h   |  1 -
 t/t0070-fundamental.sh |  4 ---
 t/unit-tests/t-ctype.c | 76 ++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 77 insertions(+), 77 deletions(-)
 delete mode 100644 t/helper/test-ctype.c
 create mode 100644 t/unit-tests/t-ctype.c


base-commit: 055bb6e9969085777b7fab83e3fee0017654f134

Comments

Junio C Hamano Dec. 26, 2023, 6:45 p.m. UTC | #1
Achu Luma <ach.lumap@gmail.com> writes:

> diff --git a/t/helper/test-ctype.c b/t/helper/test-ctype.c
> deleted file mode 100644
> index e5659df40b..0000000000
> --- a/t/helper/test-ctype.c
> +++ /dev/null
> @@ -1,70 +0,0 @@
> -#include "test-tool.h"
> -
> -static int rc;
> -
> -static void report_error(const char *class, int ch)
> -{
> -	printf("%s classifies char %d (0x%02x) wrongly\n", class, ch, ch);
> -	rc = 1;
> -}

So, if we have a is_foo() that characterises a byte that ought to
be "foo" but gets miscategorised not to be "foo", we used to
pinpoint exactly the byte value that was an issue.  We did not do
any early return ...

> ...
> -#define TEST_CLASS(t,s) {			\
> -	int i;					\
> -	for (i = 0; i < 256; i++) {		\
> -		if (is_in(s, i) != t(i))	\
> -			report_error(#t, i);	\
> -	}					\
> -	if (t(EOF))				\
> -		report_error(#t, EOF);		\
> -}

... and reported for all errors in the "class".

> diff --git a/t/unit-tests/t-ctype.c b/t/unit-tests/t-ctype.c
> new file mode 100644
> index 0000000000..41189ba9f9
> --- /dev/null
> +++ b/t/unit-tests/t-ctype.c
> @@ -0,0 +1,76 @@
> +#include "test-lib.h"
> +
> +static int is_in(const char *s, int ch)
> +{
> +	/*
> +	 * We can't find NUL using strchr. Accept it as the first
> +	 * character in the spec -- there are no empty classes.
> +	 */
> +	if (ch == '\0')
> +		return ch == *s;
> +	if (*s == '\0')
> +		s++;
> +	return !!strchr(s, ch);
> +}
> +
> +/* Macro to test a character type */
> +#define TEST_CTYPE_FUNC(func, string)			\
> +static void test_ctype_##func(void)				\
> +{								\
> +	int i;                                     	 	\
> +	for (i = 0; i < 256; i++)                 		\
> +		check_int(func(i), ==, is_in(string, i)); 	\
> +}

Now, we let check_int() to do the checking for each and every byte
value for the class.  check_int() uses different reporting and shows
the problematic value in a way that is more verbose and at the same
time is a less specific and harder to understand:

		test_msg("   left: %"PRIdMAX, a);
		test_msg("  right: %"PRIdMAX, b);

But that is probably the price to pay to use a more generic
framework, I guess.

> +int cmd_main(int argc, const char **argv) {
> +	/* Run all character type tests */
> +	TEST(test_ctype_isspace(), "isspace() works as we expect");
> +	TEST(test_ctype_isdigit(), "isdigit() works as we expect");
> +	TEST(test_ctype_isalpha(), "isalpha() works as we expect");
> +	TEST(test_ctype_isalnum(), "isalnum() works as we expect");
> +	TEST(test_ctype_is_glob_special(), "is_glob_special() works as we expect");
> +	TEST(test_ctype_is_regex_special(), "is_regex_special() works as we expect");
> +	TEST(test_ctype_is_pathspec_magic(), "is_pathspec_magic() works as we expect");
> +	TEST(test_ctype_isascii(), "isascii() works as we expect");
> +	TEST(test_ctype_islower(), "islower() works as we expect");
> +	TEST(test_ctype_isupper(), "isupper() works as we expect");
> +	TEST(test_ctype_iscntrl(), "iscntrl() works as we expect");
> +	TEST(test_ctype_ispunct(), "ispunct() works as we expect");
> +	TEST(test_ctype_isxdigit(), "isxdigit() works as we expect");
> +	TEST(test_ctype_isprint(), "isprint() works as we expect");
> +
> +	return test_done();
> +}

As a practice to use the unit-tests framework, the patch looks OK.
helper/test-ctype.c indeed is an oddball that runs once and checks
everything it wants to check, for which the unit tests framework is
much more suited.

Let's see how others react and then queue.

Thanks.
Christian Couder Dec. 27, 2023, 10:57 a.m. UTC | #2
On Tue, Dec 26, 2023 at 7:46 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Achu Luma <ach.lumap@gmail.com> writes:

> > +/* Macro to test a character type */
> > +#define TEST_CTYPE_FUNC(func, string)                        \
> > +static void test_ctype_##func(void)                          \
> > +{                                                            \
> > +     int i;                                                  \
> > +     for (i = 0; i < 256; i++)                               \
> > +             check_int(func(i), ==, is_in(string, i));       \
> > +}
>
> Now, we let check_int() to do the checking for each and every byte
> value for the class.  check_int() uses different reporting and shows
> the problematic value in a way that is more verbose and at the same
> time is a less specific and harder to understand:
>
>                 test_msg("   left: %"PRIdMAX, a);
>                 test_msg("  right: %"PRIdMAX, b);
>
> But that is probably the price to pay to use a more generic
> framework, I guess.

I have added Phillip and Josh in Cc: as they might have ideas about this.

Also it might not be a big issue here, but when the new unit test
framework was proposed, I commented on the fact that "left" and
"right" were perhaps a bit less explicit than "actual" and "expected".

> > +int cmd_main(int argc, const char **argv) {
> > +     /* Run all character type tests */
> > +     TEST(test_ctype_isspace(), "isspace() works as we expect");
> > +     TEST(test_ctype_isdigit(), "isdigit() works as we expect");
> > +     TEST(test_ctype_isalpha(), "isalpha() works as we expect");
> > +     TEST(test_ctype_isalnum(), "isalnum() works as we expect");
> > +     TEST(test_ctype_is_glob_special(), "is_glob_special() works as we expect");
> > +     TEST(test_ctype_is_regex_special(), "is_regex_special() works as we expect");
> > +     TEST(test_ctype_is_pathspec_magic(), "is_pathspec_magic() works as we expect");
> > +     TEST(test_ctype_isascii(), "isascii() works as we expect");
> > +     TEST(test_ctype_islower(), "islower() works as we expect");
> > +     TEST(test_ctype_isupper(), "isupper() works as we expect");
> > +     TEST(test_ctype_iscntrl(), "iscntrl() works as we expect");
> > +     TEST(test_ctype_ispunct(), "ispunct() works as we expect");
> > +     TEST(test_ctype_isxdigit(), "isxdigit() works as we expect");
> > +     TEST(test_ctype_isprint(), "isprint() works as we expect");
> > +
> > +     return test_done();
> > +}
>
> As a practice to use the unit-tests framework, the patch looks OK.
> helper/test-ctype.c indeed is an oddball that runs once and checks
> everything it wants to check, for which the unit tests framework is
> much more suited.

Yeah, I agree.

> Let's see how others react and then queue.
>
> Thanks.
René Scharfe Dec. 27, 2023, 11:57 a.m. UTC | #3
Am 27.12.23 um 11:57 schrieb Christian Couder:
> On Tue, Dec 26, 2023 at 7:46 PM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Achu Luma <ach.lumap@gmail.com> writes:
>
>>> +/* Macro to test a character type */
>>> +#define TEST_CTYPE_FUNC(func, string)                        \
>>> +static void test_ctype_##func(void)                          \
>>> +{                                                            \
>>> +     int i;                                                  \
>>> +     for (i = 0; i < 256; i++)                               \
>>> +             check_int(func(i), ==, is_in(string, i));       \
>>> +}
>>
>> Now, we let check_int() to do the checking for each and every byte
>> value for the class.  check_int() uses different reporting and shows
>> the problematic value in a way that is more verbose and at the same
>> time is a less specific and harder to understand:
>>
>>                 test_msg("   left: %"PRIdMAX, a);
>>                 test_msg("  right: %"PRIdMAX, b);
>>
>> But that is probably the price to pay to use a more generic
>> framework, I guess.
>
> I have added Phillip and Josh in Cc: as they might have ideas about this.

You can write custom messages for custom tests using test_assert().

> Also it might not be a big issue here, but when the new unit test
> framework was proposed, I commented on the fact that "left" and
> "right" were perhaps a bit less explicit than "actual" and "expected".

True.

>>> +int cmd_main(int argc, const char **argv) {
>>> +     /* Run all character type tests */
>>> +     TEST(test_ctype_isspace(), "isspace() works as we expect");
>>> +     TEST(test_ctype_isdigit(), "isdigit() works as we expect");
>>> +     TEST(test_ctype_isalpha(), "isalpha() works as we expect");
>>> +     TEST(test_ctype_isalnum(), "isalnum() works as we expect");
>>> +     TEST(test_ctype_is_glob_special(), "is_glob_special() works as we expect");
>>> +     TEST(test_ctype_is_regex_special(), "is_regex_special() works as we expect");
>>> +     TEST(test_ctype_is_pathspec_magic(), "is_pathspec_magic() works as we expect");
>>> +     TEST(test_ctype_isascii(), "isascii() works as we expect");
>>> +     TEST(test_ctype_islower(), "islower() works as we expect");
>>> +     TEST(test_ctype_isupper(), "isupper() works as we expect");
>>> +     TEST(test_ctype_iscntrl(), "iscntrl() works as we expect");
>>> +     TEST(test_ctype_ispunct(), "ispunct() works as we expect");
>>> +     TEST(test_ctype_isxdigit(), "isxdigit() works as we expect");
>>> +     TEST(test_ctype_isprint(), "isprint() works as we expect");
>>> +
>>> +     return test_done();
>>> +}
>>
>> As a practice to use the unit-tests framework, the patch looks OK.

The added repetition is a bit grating.  With a bit of setup, loop
unrolling and stringification you can retain the property of only having
to mention the class name once.  Demo patch below.

>> helper/test-ctype.c indeed is an oddball that runs once and checks
>> everything it wants to check, for which the unit tests framework is
>> much more suited.
>
> Yeah, I agree.

Indeed: test-ctype does unit tests, so the unit test framework should
be a perfect fit.  It still feels a bit raw that this point, though,
but that's to be expected.

René


diff --git a/t/unit-tests/t-ctype.c b/t/unit-tests/t-ctype.c
index 41189ba9f9..7903856cec 100644
--- a/t/unit-tests/t-ctype.c
+++ b/t/unit-tests/t-ctype.c
@@ -13,13 +13,23 @@ static int is_in(const char *s, int ch)
 	return !!strchr(s, ch);
 }

-/* Macro to test a character type */
-#define TEST_CTYPE_FUNC(func, string)			\
-static void test_ctype_##func(void)				\
-{								\
-	int i;                                     	 	\
-	for (i = 0; i < 256; i++)                 		\
-		check_int(func(i), ==, is_in(string, i)); 	\
+struct ctype {
+	const char *name;
+	const char *expect;
+	int actual[256];
+};
+
+static void test_ctype(const struct ctype *class)
+{
+	for (int i = 0; i < 256; i++) {
+		int expect = is_in(class->expect, i);
+		int actual = class->actual[i];
+		int res = test_assert(TEST_LOCATION(), class->name,
+				      actual == expect);
+		if (!res)
+			test_msg("%s classifies char %d (0x%02x) wrongly",
+				 class->name, i, i);
+	}
 }

 #define DIGIT "0123456789"
@@ -40,37 +50,39 @@ static void test_ctype_##func(void)				\
 	"\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" \
 	"\x7f"

-TEST_CTYPE_FUNC(isdigit, DIGIT)
-TEST_CTYPE_FUNC(isspace, " \n\r\t")
-TEST_CTYPE_FUNC(isalpha, LOWER UPPER)
-TEST_CTYPE_FUNC(isalnum, LOWER UPPER DIGIT)
-TEST_CTYPE_FUNC(is_glob_special, "*?[\\")
-TEST_CTYPE_FUNC(is_regex_special, "$()*+.?[\\^{|")
-TEST_CTYPE_FUNC(is_pathspec_magic, "!\"#%&',-/:;<=>@_`~")
-TEST_CTYPE_FUNC(isascii, ASCII)
-TEST_CTYPE_FUNC(islower, LOWER)
-TEST_CTYPE_FUNC(isupper, UPPER)
-TEST_CTYPE_FUNC(iscntrl, CNTRL)
-TEST_CTYPE_FUNC(ispunct, PUNCT)
-TEST_CTYPE_FUNC(isxdigit, DIGIT "abcdefABCDEF")
-TEST_CTYPE_FUNC(isprint, LOWER UPPER DIGIT PUNCT " ")
+#define APPLY16(f, n) \
+	f(n + 0x0), f(n + 0x1), f(n + 0x2), f(n + 0x3), \
+	f(n + 0x4), f(n + 0x5), f(n + 0x6), f(n + 0x7), \
+	f(n + 0x8), f(n + 0x9), f(n + 0xa), f(n + 0xb), \
+	f(n + 0xc), f(n + 0xd), f(n + 0xe), f(n + 0xf)
+#define APPLY256(f) \
+	APPLY16(f, 0x00), APPLY16(f, 0x10), APPLY16(f, 0x20), APPLY16(f, 0x30),\
+	APPLY16(f, 0x40), APPLY16(f, 0x50), APPLY16(f, 0x60), APPLY16(f, 0x70),\
+	APPLY16(f, 0x80), APPLY16(f, 0x90), APPLY16(f, 0xa0), APPLY16(f, 0xb0),\
+	APPLY16(f, 0xc0), APPLY16(f, 0xd0), APPLY16(f, 0xe0), APPLY16(f, 0xf0),\
+
+#define CTYPE(name, expect) { #name, expect, { APPLY256(name) }  }

 int cmd_main(int argc, const char **argv) {
+	struct ctype classes[] = {
+		CTYPE(isdigit, DIGIT),
+		CTYPE(isspace, " \n\r\t"),
+		CTYPE(isalpha, LOWER UPPER),
+		CTYPE(isalnum, LOWER UPPER DIGIT),
+		CTYPE(is_glob_special, "*?[\\"),
+		CTYPE(is_regex_special, "$()*+.?[\\^{|"),
+		CTYPE(is_pathspec_magic, "!\"#%&',-/:;<=>@_`~"),
+		CTYPE(isascii, ASCII),
+		CTYPE(islower, LOWER),
+		CTYPE(isupper, UPPER),
+		CTYPE(iscntrl, CNTRL),
+		CTYPE(ispunct, PUNCT),
+		CTYPE(isxdigit, DIGIT "abcdefABCDEF"),
+		CTYPE(isprint, LOWER UPPER DIGIT PUNCT " "),
+	};
 	/* Run all character type tests */
-	TEST(test_ctype_isspace(), "isspace() works as we expect");
-	TEST(test_ctype_isdigit(), "isdigit() works as we expect");
-	TEST(test_ctype_isalpha(), "isalpha() works as we expect");
-	TEST(test_ctype_isalnum(), "isalnum() works as we expect");
-	TEST(test_ctype_is_glob_special(), "is_glob_special() works as we expect");
-	TEST(test_ctype_is_regex_special(), "is_regex_special() works as we expect");
-	TEST(test_ctype_is_pathspec_magic(), "is_pathspec_magic() works as we expect");
-	TEST(test_ctype_isascii(), "isascii() works as we expect");
-	TEST(test_ctype_islower(), "islower() works as we expect");
-	TEST(test_ctype_isupper(), "isupper() works as we expect");
-	TEST(test_ctype_iscntrl(), "iscntrl() works as we expect");
-	TEST(test_ctype_ispunct(), "ispunct() works as we expect");
-	TEST(test_ctype_isxdigit(), "isxdigit() works as we expect");
-	TEST(test_ctype_isprint(), "isprint() works as we expect");
+	for (int i = 0; i < ARRAY_SIZE(classes); i++)
+		TEST(test_ctype(&classes[i]), "%s works", classes[i].name);

 	return test_done();
 }
Phillip Wood Dec. 27, 2023, 2:40 p.m. UTC | #4
On 27/12/2023 11:57, René Scharfe wrote:
> Am 27.12.23 um 11:57 schrieb Christian Couder:
>> On Tue, Dec 26, 2023 at 7:46 PM Junio C Hamano <gitster@pobox.com> wrote:
>>>
>>> Achu Luma <ach.lumap@gmail.com> writes:
>>
>>>> +/* Macro to test a character type */
>>>> +#define TEST_CTYPE_FUNC(func, string)                        \
>>>> +static void test_ctype_##func(void)                          \
>>>> +{                                                            \
>>>> +     int i;                                                  \
>>>> +     for (i = 0; i < 256; i++)                               \
>>>> +             check_int(func(i), ==, is_in(string, i));       \
>>>> +}
>>>
>>> Now, we let check_int() to do the checking for each and every byte
>>> value for the class.  check_int() uses different reporting and shows
>>> the problematic value in a way that is more verbose and at the same
>>> time is a less specific and harder to understand:
>>>
>>>                  test_msg("   left: %"PRIdMAX, a);
>>>                  test_msg("  right: %"PRIdMAX, b);
>>>
>>> But that is probably the price to pay to use a more generic
>>> framework, I guess.
>>
>> I have added Phillip and Josh in Cc: as they might have ideas about this.
> 
> You can write custom messages for custom tests using test_assert().

Another possibility is to do

	for (int i = 0; i < 256; i++) {
		if (!check_int(func(i), ==, is_in(string, i))
			test_msg("       i: %02x", i);
	}

To print the character code as well as the actual and expected return 
values of check_int(). The funny spacing is intended to keep the output 
aligned. I did wonder if we should be using

	check(func(i) == is_in(string, i))

instead of check_int() but I think it is useful to have the return value 
printed on error in case we start returning "53" instead of "1" for 
"true" [1]. With the extra test_msg() above we can now see if the test 
fails because of a mis-categorization or because func() returned a 
different non-zero value when we were expecting "1".

>> Also it might not be a big issue here, but when the new unit test
>> framework was proposed, I commented on the fact that "left" and
>> "right" were perhaps a bit less explicit than "actual" and "expected".

If people are worried about this then it would be possible to change the 
check_xxx() macros pass the stringified relational operator into the 
various check_xxx_loc() functions and then print "expected" and "actual" 
when the operator is "==" and "left" and "right" otherwise.

Best Wishes

Phillip

[1] As an aside I wonder if the ctype functions would make good test 
balloons for using _Bool by changing sane_istest() to be

#define sane_istest(x,mask) ((bool)(sane_ctype[(unsigned char)(x)] & 
(mask)))

so that we check casting to _Bool coerces non-zero values to "1"
Junio C Hamano Dec. 27, 2023, 11:48 p.m. UTC | #5
René Scharfe <l.s.r@web.de> writes:

>> Also it might not be a big issue here, but when the new unit test
>> framework was proposed, I commented on the fact that "left" and
>> "right" were perhaps a bit less explicit than "actual" and "expected".
>
> True.
> ...
> The added repetition is a bit grating.  With a bit of setup, loop
> unrolling and stringification you can retain the property of only having
> to mention the class name once.  Demo patch below.

Nice.

This (and your mempool thing) being one of the early efforts to
adopt the unit-test framework outside the initial set of sample
tests, it is understandable that we might find what framework offers
is still lacking.  But at the same time, while the macro tricks
demonstrated here are all amusing to read and admire, it feels a bit
too much to expect that the test writers are willing to invent
something like these every time they want to test.

Being a relatively faithful conversion of the original ctype tests,
with its thorough enumeration of test samples and expected output,
is what makes this test program require these macro tricks, and it
does not have much to do with the features (or lack thereof) of the
framework, I guess.

> +struct ctype {
> +	const char *name;
> +	const char *expect;
> +	int actual[256];
> +};
> +
> +static void test_ctype(const struct ctype *class)
> +{
> +	for (int i = 0; i < 256; i++) {
> +		int expect = is_in(class->expect, i);
> +		int actual = class->actual[i];
> +		int res = test_assert(TEST_LOCATION(), class->name,
> +				      actual == expect);
> +		if (!res)
> +			test_msg("%s classifies char %d (0x%02x) wrongly",
> +				 class->name, i, i);
> +	}
>  }

Somehow, the "test_assert" does not seem to be adding much value
here (i.e. we can do "res = (actual == expect)" there).  Is this
because we want to be able to report success, too?

    ... goes and looks at test_assert() ...

Ah, is it because we want to be able to "skip" (which pretends that
the assert() was satisified).  OK, but then the error reporting from
it is redundant with our own test_msg().  

Everything below this line was a fun read ;-)

Thanks.

> ...
> +#define APPLY16(f, n) \
> +	f(n + 0x0), f(n + 0x1), f(n + 0x2), f(n + 0x3), \
> +	f(n + 0x4), f(n + 0x5), f(n + 0x6), f(n + 0x7), \
> +	f(n + 0x8), f(n + 0x9), f(n + 0xa), f(n + 0xb), \
> +	f(n + 0xc), f(n + 0xd), f(n + 0xe), f(n + 0xf)
> +#define APPLY256(f) \
> +	APPLY16(f, 0x00), APPLY16(f, 0x10), APPLY16(f, 0x20), APPLY16(f, 0x30),\
> +	APPLY16(f, 0x40), APPLY16(f, 0x50), APPLY16(f, 0x60), APPLY16(f, 0x70),\
> +	APPLY16(f, 0x80), APPLY16(f, 0x90), APPLY16(f, 0xa0), APPLY16(f, 0xb0),\
> +	APPLY16(f, 0xc0), APPLY16(f, 0xd0), APPLY16(f, 0xe0), APPLY16(f, 0xf0),\
> +
> +#define CTYPE(name, expect) { #name, expect, { APPLY256(name) }  }
>
>  int cmd_main(int argc, const char **argv) {
> +	struct ctype classes[] = {
> +		CTYPE(isdigit, DIGIT),
> +		CTYPE(isspace, " \n\r\t"),
> +		CTYPE(isalpha, LOWER UPPER),
> +		CTYPE(isalnum, LOWER UPPER DIGIT),
> +		CTYPE(is_glob_special, "*?[\\"),
> +		CTYPE(is_regex_special, "$()*+.?[\\^{|"),
> +		CTYPE(is_pathspec_magic, "!\"#%&',-/:;<=>@_`~"),
> +		CTYPE(isascii, ASCII),
> +		CTYPE(islower, LOWER),
> +		CTYPE(isupper, UPPER),
> +		CTYPE(iscntrl, CNTRL),
> +		CTYPE(ispunct, PUNCT),
> +		CTYPE(isxdigit, DIGIT "abcdefABCDEF"),
> +		CTYPE(isprint, LOWER UPPER DIGIT PUNCT " "),
> +	};
>  	/* Run all character type tests */
> -	TEST(test_ctype_isspace(), "isspace() works as we expect");
> -	TEST(test_ctype_isdigit(), "isdigit() works as we expect");
> -	TEST(test_ctype_isalpha(), "isalpha() works as we expect");
> -	TEST(test_ctype_isalnum(), "isalnum() works as we expect");
> -	TEST(test_ctype_is_glob_special(), "is_glob_special() works as we expect");
> -	TEST(test_ctype_is_regex_special(), "is_regex_special() works as we expect");
> -	TEST(test_ctype_is_pathspec_magic(), "is_pathspec_magic() works as we expect");
> -	TEST(test_ctype_isascii(), "isascii() works as we expect");
> -	TEST(test_ctype_islower(), "islower() works as we expect");
> -	TEST(test_ctype_isupper(), "isupper() works as we expect");
> -	TEST(test_ctype_iscntrl(), "iscntrl() works as we expect");
> -	TEST(test_ctype_ispunct(), "ispunct() works as we expect");
> -	TEST(test_ctype_isxdigit(), "isxdigit() works as we expect");
> -	TEST(test_ctype_isprint(), "isprint() works as we expect");
> +	for (int i = 0; i < ARRAY_SIZE(classes); i++)
> +		TEST(test_ctype(&classes[i]), "%s works", classes[i].name);
>
>  	return test_done();
>  }
René Scharfe Dec. 28, 2023, 4:05 p.m. UTC | #6
Am 28.12.23 um 00:48 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>>> Also it might not be a big issue here, but when the new unit test
>>> framework was proposed, I commented on the fact that "left" and
>>> "right" were perhaps a bit less explicit than "actual" and "expected".
>>
>> True.
>> ...
>> The added repetition is a bit grating.  With a bit of setup, loop
>> unrolling and stringification you can retain the property of only having
>> to mention the class name once.  Demo patch below.
>
> Nice.
>
> This (and your mempool thing) being one of the early efforts to
> adopt the unit-test framework outside the initial set of sample
> tests, it is understandable that we might find what framework offers
> is still lacking.  But at the same time, while the macro tricks
> demonstrated here are all amusing to read and admire, it feels a bit
> too much to expect that the test writers are willing to invent
> something like these every time they want to test.
>
> Being a relatively faithful conversion of the original ctype tests,
> with its thorough enumeration of test samples and expected output,
> is what makes this test program require these macro tricks, and it
> does not have much to do with the features (or lack thereof) of the
> framework, I guess.

*nod*

>
>> +struct ctype {
>> +	const char *name;
>> +	const char *expect;
>> +	int actual[256];
>> +};
>> +
>> +static void test_ctype(const struct ctype *class)
>> +{
>> +	for (int i = 0; i < 256; i++) {
>> +		int expect = is_in(class->expect, i);
>> +		int actual = class->actual[i];
>> +		int res = test_assert(TEST_LOCATION(), class->name,
>> +				      actual == expect);
>> +		if (!res)
>> +			test_msg("%s classifies char %d (0x%02x) wrongly",
>> +				 class->name, i, i);
>> +	}
>>  }
>
> Somehow, the "test_assert" does not seem to be adding much value
> here (i.e. we can do "res = (actual == expect)" there).  Is this
> because we want to be able to report success, too?
>
>     ... goes and looks at test_assert() ...
>
> Ah, is it because we want to be able to "skip" (which pretends that
> the assert() was satisified).  OK, but then the error reporting from
> it is redundant with our own test_msg().

True, the test_msg() emits the old message here, but it doesn't have to
report that the check failed anymore, because test_assert() already
covers that part.  It would only have to report the misclassified
character and perhaps the expected result.

René
Taylor Blau Jan. 2, 2024, 6:55 p.m. UTC | #7
On Tue, Dec 26, 2023 at 10:45:59AM -0800, Junio C Hamano wrote:
> Achu Luma <ach.lumap@gmail.com> writes:
>
> > diff --git a/t/helper/test-ctype.c b/t/helper/test-ctype.c
> > deleted file mode 100644
> > index e5659df40b..0000000000
> > --- a/t/helper/test-ctype.c
> > +++ /dev/null
> > @@ -1,70 +0,0 @@
> > -#include "test-tool.h"
> > -
> > -static int rc;
> > -
> > -static void report_error(const char *class, int ch)
> > -{
> > -	printf("%s classifies char %d (0x%02x) wrongly\n", class, ch, ch);
> > -	rc = 1;
> > -}
>
> So, if we have a is_foo() that characterises a byte that ought to
> be "foo" but gets miscategorised not to be "foo", we used to
> pinpoint exactly the byte value that was an issue.  We did not do
> any early return ...
>
> > ...
> > -#define TEST_CLASS(t,s) {			\
> > -	int i;					\
> > -	for (i = 0; i < 256; i++) {		\
> > -		if (is_in(s, i) != t(i))	\
> > -			report_error(#t, i);	\
> > -	}					\
> > -	if (t(EOF))				\
> > -		report_error(#t, EOF);		\
> > -}
>
> ... and reported for all errors in the "class".
>
> > diff --git a/t/unit-tests/t-ctype.c b/t/unit-tests/t-ctype.c
> > new file mode 100644
> > index 0000000000..41189ba9f9
> > --- /dev/null
> > +++ b/t/unit-tests/t-ctype.c
> > @@ -0,0 +1,76 @@
> > +#include "test-lib.h"
> > +
> > +static int is_in(const char *s, int ch)
> > +{
> > +	/*
> > +	 * We can't find NUL using strchr. Accept it as the first
> > +	 * character in the spec -- there are no empty classes.
> > +	 */
> > +	if (ch == '\0')
> > +		return ch == *s;
> > +	if (*s == '\0')
> > +		s++;
> > +	return !!strchr(s, ch);
> > +}
> > +
> > +/* Macro to test a character type */
> > +#define TEST_CTYPE_FUNC(func, string)			\
> > +static void test_ctype_##func(void)				\
> > +{								\
> > +	int i;                                     	 	\
> > +	for (i = 0; i < 256; i++)                 		\
> > +		check_int(func(i), ==, is_in(string, i)); 	\
> > +}
>
> Now, we let check_int() to do the checking for each and every byte
> value for the class.  check_int() uses different reporting and shows
> the problematic value in a way that is more verbose and at the same
> time is a less specific and harder to understand:
>
> 		test_msg("   left: %"PRIdMAX, a);
> 		test_msg("  right: %"PRIdMAX, b);
>
> But that is probably the price to pay to use a more generic
> framework, I guess.

Perhaps I'm missing something here, since I haven't followed the
unit-test effort very closely, but this check_int() macro feels like it
might be overkill for what we're trying to do.

We know that the expected value is the result of is_in(string, i), so I
wonder if we might benefit from having an "assert_equals()" that looks
like:

    assert_equals(is_in(string, i), func(i));

Where we follow the usual convention of treating the first argument as
the expected value, and the second as the actual value. Then we could
format our error message to be more specific, like:

    test_msg("expected %d, got %d", expected, actual);

I think that this would be a little more readable, and still seems
flexible enough to support the kind of thing that check_int(..., ==,
...) is after.

> > +int cmd_main(int argc, const char **argv) {
> > +	/* Run all character type tests */
> > +	TEST(test_ctype_isspace(), "isspace() works as we expect");
> > +	TEST(test_ctype_isdigit(), "isdigit() works as we expect");
> > +	TEST(test_ctype_isalpha(), "isalpha() works as we expect");
> > +	TEST(test_ctype_isalnum(), "isalnum() works as we expect");
> > +	TEST(test_ctype_is_glob_special(), "is_glob_special() works as we expect");
> > +	TEST(test_ctype_is_regex_special(), "is_regex_special() works as we expect");
> > +	TEST(test_ctype_is_pathspec_magic(), "is_pathspec_magic() works as we expect");
> > +	TEST(test_ctype_isascii(), "isascii() works as we expect");
> > +	TEST(test_ctype_islower(), "islower() works as we expect");
> > +	TEST(test_ctype_isupper(), "isupper() works as we expect");
> > +	TEST(test_ctype_iscntrl(), "iscntrl() works as we expect");
> > +	TEST(test_ctype_ispunct(), "ispunct() works as we expect");
> > +	TEST(test_ctype_isxdigit(), "isxdigit() works as we expect");
> > +	TEST(test_ctype_isprint(), "isprint() works as we expect");
> > +
> > +	return test_done();
> > +}
>
> As a practice to use the unit-tests framework, the patch looks OK.
> helper/test-ctype.c indeed is an oddball that runs once and checks
> everything it wants to check, for which the unit tests framework is
> much more suited.

As an aside, I don't think we need the "works as we expect" suffix in
each test description. I personally would be fine with something like:

    TEST(test_ctype_isspace(), "isspace()");
    TEST(test_ctype_isdigit(), "isdigit()");
    ...

But don't feel strongly about it.

Thanks,
Taylor
diff mbox series

Patch

diff --git a/Makefile b/Makefile
index 88ba7a3c51..a4df48ba65 100644
--- a/Makefile
+++ b/Makefile
@@ -792,7 +792,6 @@  TEST_BUILTINS_OBJS += test-chmtime.o
 TEST_BUILTINS_OBJS += test-config.o
 TEST_BUILTINS_OBJS += test-crontab.o
 TEST_BUILTINS_OBJS += test-csprng.o
-TEST_BUILTINS_OBJS += test-ctype.o
 TEST_BUILTINS_OBJS += test-date.o
 TEST_BUILTINS_OBJS += test-delta.o
 TEST_BUILTINS_OBJS += test-dir-iterator.o
@@ -1341,6 +1340,7 @@  THIRD_PARTY_SOURCES += sha1dc/%
 
 UNIT_TEST_PROGRAMS += t-basic
 UNIT_TEST_PROGRAMS += t-strbuf
+UNIT_TEST_PROGRAMS += t-ctype
 UNIT_TEST_PROGS = $(patsubst %,$(UNIT_TEST_BIN)/%$X,$(UNIT_TEST_PROGRAMS))
 UNIT_TEST_OBJS = $(patsubst %,$(UNIT_TEST_DIR)/%.o,$(UNIT_TEST_PROGRAMS))
 UNIT_TEST_OBJS += $(UNIT_TEST_DIR)/test-lib.o
diff --git a/t/helper/test-ctype.c b/t/helper/test-ctype.c
deleted file mode 100644
index e5659df40b..0000000000
--- a/t/helper/test-ctype.c
+++ /dev/null
@@ -1,70 +0,0 @@ 
-#include "test-tool.h"
-
-static int rc;
-
-static void report_error(const char *class, int ch)
-{
-	printf("%s classifies char %d (0x%02x) wrongly\n", class, ch, ch);
-	rc = 1;
-}
-
-static int is_in(const char *s, int ch)
-{
-	/*
-	 * We can't find NUL using strchr. Accept it as the first
-	 * character in the spec -- there are no empty classes.
-	 */
-	if (ch == '\0')
-		return ch == *s;
-	if (*s == '\0')
-		s++;
-	return !!strchr(s, ch);
-}
-
-#define TEST_CLASS(t,s) {			\
-	int i;					\
-	for (i = 0; i < 256; i++) {		\
-		if (is_in(s, i) != t(i))	\
-			report_error(#t, i);	\
-	}					\
-	if (t(EOF))				\
-		report_error(#t, EOF);		\
-}
-
-#define DIGIT "0123456789"
-#define LOWER "abcdefghijklmnopqrstuvwxyz"
-#define UPPER "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
-#define PUNCT "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~"
-#define ASCII \
-	"\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" \
-	"\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" \
-	"\x20\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f" \
-	"\x30\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f" \
-	"\x40\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f" \
-	"\x50\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f" \
-	"\x60\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" \
-	"\x70\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
-#define CNTRL \
-	"\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" \
-	"\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" \
-	"\x7f"
-
-int cmd__ctype(int argc UNUSED, const char **argv UNUSED)
-{
-	TEST_CLASS(isdigit, DIGIT);
-	TEST_CLASS(isspace, " \n\r\t");
-	TEST_CLASS(isalpha, LOWER UPPER);
-	TEST_CLASS(isalnum, LOWER UPPER DIGIT);
-	TEST_CLASS(is_glob_special, "*?[\\");
-	TEST_CLASS(is_regex_special, "$()*+.?[\\^{|");
-	TEST_CLASS(is_pathspec_magic, "!\"#%&',-/:;<=>@_`~");
-	TEST_CLASS(isascii, ASCII);
-	TEST_CLASS(islower, LOWER);
-	TEST_CLASS(isupper, UPPER);
-	TEST_CLASS(iscntrl, CNTRL);
-	TEST_CLASS(ispunct, PUNCT);
-	TEST_CLASS(isxdigit, DIGIT "abcdefABCDEF");
-	TEST_CLASS(isprint, LOWER UPPER DIGIT PUNCT " ");
-
-	return rc;
-}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 37ba996539..33b9501c21 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -19,7 +19,6 @@  static struct test_cmd cmds[] = {
 	{ "config", cmd__config },
 	{ "crontab", cmd__crontab },
 	{ "csprng", cmd__csprng },
-	{ "ctype", cmd__ctype },
 	{ "date", cmd__date },
 	{ "delta", cmd__delta },
 	{ "dir-iterator", cmd__dir_iterator },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 8a1a7c63da..b72f07ded9 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -12,7 +12,6 @@  int cmd__chmtime(int argc, const char **argv);
 int cmd__config(int argc, const char **argv);
 int cmd__crontab(int argc, const char **argv);
 int cmd__csprng(int argc, const char **argv);
-int cmd__ctype(int argc, const char **argv);
 int cmd__date(int argc, const char **argv);
 int cmd__delta(int argc, const char **argv);
 int cmd__dir_iterator(int argc, const char **argv);
diff --git a/t/t0070-fundamental.sh b/t/t0070-fundamental.sh
index 487bc8d905..a4756fbab9 100755
--- a/t/t0070-fundamental.sh
+++ b/t/t0070-fundamental.sh
@@ -9,10 +9,6 @@  Verify wrappers and compatibility functions.
 TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
-test_expect_success 'character classes (isspace, isalpha etc.)' '
-	test-tool ctype
-'
-
 test_expect_success 'mktemp to nonexistent directory prints filename' '
 	test_must_fail test-tool mktemp doesnotexist/testXXXXXX 2>err &&
 	grep "doesnotexist/test" err
diff --git a/t/unit-tests/t-ctype.c b/t/unit-tests/t-ctype.c
new file mode 100644
index 0000000000..41189ba9f9
--- /dev/null
+++ b/t/unit-tests/t-ctype.c
@@ -0,0 +1,76 @@ 
+#include "test-lib.h"
+
+static int is_in(const char *s, int ch)
+{
+	/*
+	 * We can't find NUL using strchr. Accept it as the first
+	 * character in the spec -- there are no empty classes.
+	 */
+	if (ch == '\0')
+		return ch == *s;
+	if (*s == '\0')
+		s++;
+	return !!strchr(s, ch);
+}
+
+/* Macro to test a character type */
+#define TEST_CTYPE_FUNC(func, string)			\
+static void test_ctype_##func(void)				\
+{								\
+	int i;                                     	 	\
+	for (i = 0; i < 256; i++)                 		\
+		check_int(func(i), ==, is_in(string, i)); 	\
+}
+
+#define DIGIT "0123456789"
+#define LOWER "abcdefghijklmnopqrstuvwxyz"
+#define UPPER "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
+#define PUNCT "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~"
+#define ASCII \
+	"\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" \
+	"\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" \
+	"\x20\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f" \
+	"\x30\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f" \
+	"\x40\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f" \
+	"\x50\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f" \
+	"\x60\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" \
+	"\x70\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+#define CNTRL \
+	"\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" \
+	"\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" \
+	"\x7f"
+
+TEST_CTYPE_FUNC(isdigit, DIGIT)
+TEST_CTYPE_FUNC(isspace, " \n\r\t")
+TEST_CTYPE_FUNC(isalpha, LOWER UPPER)
+TEST_CTYPE_FUNC(isalnum, LOWER UPPER DIGIT)
+TEST_CTYPE_FUNC(is_glob_special, "*?[\\")
+TEST_CTYPE_FUNC(is_regex_special, "$()*+.?[\\^{|")
+TEST_CTYPE_FUNC(is_pathspec_magic, "!\"#%&',-/:;<=>@_`~")
+TEST_CTYPE_FUNC(isascii, ASCII)
+TEST_CTYPE_FUNC(islower, LOWER)
+TEST_CTYPE_FUNC(isupper, UPPER)
+TEST_CTYPE_FUNC(iscntrl, CNTRL)
+TEST_CTYPE_FUNC(ispunct, PUNCT)
+TEST_CTYPE_FUNC(isxdigit, DIGIT "abcdefABCDEF")
+TEST_CTYPE_FUNC(isprint, LOWER UPPER DIGIT PUNCT " ")
+
+int cmd_main(int argc, const char **argv) {
+	/* Run all character type tests */
+	TEST(test_ctype_isspace(), "isspace() works as we expect");
+	TEST(test_ctype_isdigit(), "isdigit() works as we expect");
+	TEST(test_ctype_isalpha(), "isalpha() works as we expect");
+	TEST(test_ctype_isalnum(), "isalnum() works as we expect");
+	TEST(test_ctype_is_glob_special(), "is_glob_special() works as we expect");
+	TEST(test_ctype_is_regex_special(), "is_regex_special() works as we expect");
+	TEST(test_ctype_is_pathspec_magic(), "is_pathspec_magic() works as we expect");
+	TEST(test_ctype_isascii(), "isascii() works as we expect");
+	TEST(test_ctype_islower(), "islower() works as we expect");
+	TEST(test_ctype_isupper(), "isupper() works as we expect");
+	TEST(test_ctype_iscntrl(), "iscntrl() works as we expect");
+	TEST(test_ctype_ispunct(), "ispunct() works as we expect");
+	TEST(test_ctype_isxdigit(), "isxdigit() works as we expect");
+	TEST(test_ctype_isprint(), "isprint() works as we expect");
+
+	return test_done();
+}