diff mbox series

[1/2] t/unit-tests: match functions signature with trailing code

Message ID 20250107091932.126673-2-kuforiji98@gmail.com (mailing list archive)
State New
Headers show
Series t/unit-tests: convert hash tests to use clar | expand

Commit Message

Seyi Kuforiji Jan. 7, 2025, 9:19 a.m. UTC
The `generate-clar-decls.sh` script extracts signatures of test
functions from our unit tests, which will later get used by the clar to
automatically wire up tests. The sed command only matches lines that
ended immediately after `void)`, causing it to miss declarations with
additional content such as comments or annotations.

Relax the regular expression by making it match lines with trailing data
after the function signature. This ensures that all valid function
declarations are captured and formatted as `extern` declarations
regardless of their formatting style, improving the robustness of the
script when parsing `$suite` files.

This will be used in subsequent commits to match and capture the
function signature correctly, regardless of any trailing content.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
---
 t/unit-tests/generate-clar-decls.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--
2.34.1

Comments

Junio C Hamano Jan. 7, 2025, 6:16 p.m. UTC | #1
Seyi Kuforiji <kuforiji98@gmail.com> writes:

> The `generate-clar-decls.sh` script extracts signatures of test
> functions from our unit tests, which will later get used by the clar to
> automatically wire up tests. The sed command only matches lines that
> ended immediately after `void)`, causing it to miss declarations with
> additional content such as comments or annotations.
>
> Relax the regular expression by making it match lines with trailing data
> after the function signature. This ensures that all valid function
> declarations are captured and formatted as `extern` declarations
> regardless of their formatting style, improving the robustness of the
> script when parsing `$suite` files.
>
> This will be used in subsequent commits to match and capture the
> function signature correctly, regardless of any trailing content.

I am not sure if this is going in the right direction, though.

Especially for things like test suites that are looked at and worked
on only by develoeprs *and* these tools, being uniform and consistent
weighs more than being more flexible.

Let me state it in another way.  How many of the existing test
pieces are picked up by the current pattern, and among them how many
of them would see vast improvements if they are allowed to have
arbitrary garbage after their "I do not take any arguments" function
signature?  Are new tests you are migrating from outside the clar
world lose a lot if they are no longer allowed to have comments
there, or would it be suffice to have the comments before the
functions (which many of our function definition do anyway)?

A quick peek at [PATCH 2/2] tells me that this is not even something
that would make it easier to port the existing tests by allowing
more straight line-by-line copies or something.  The patch splits
many in-line test pieces in the "main" into separate functions, and
it does so in a rather unusual format, e.g.,

  void test_hash__multi_character(void) TEST_HASH_STR("abc",
          "a9993e364706816aba3e25717850c26c9cd0d89d",
          "ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad")

where TEST_HASH_STR() expands to the function body that starts with
a "{" and ends with a "}".  It can well be written more like

    void test_hash__multi_character(void)
    {
	TEST_HASH_STR("abc",
        	"a9993e364706816aba3e25717850c26c9cd0d89d",
		"ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad");
    }

and we do not need this step at all if we did so.  Such a construct
would be a lot friendlier to the editors that auto-indent, too.

So, I do not quite see much value in this particular change.

> Mentored-by: Patrick Steinhardt <ps@pks.im>
> Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
> ---
>  t/unit-tests/generate-clar-decls.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/t/unit-tests/generate-clar-decls.sh b/t/unit-tests/generate-clar-decls.sh
> index 3b315c64b3..02e45cf0ba 100755
> --- a/t/unit-tests/generate-clar-decls.sh
> +++ b/t/unit-tests/generate-clar-decls.sh
> @@ -14,6 +14,6 @@ do
>  	suite_name=$(basename "$suite")
>  	suite_name=${suite_name%.c}
>  	suite_name=${suite_name#u-}
> -	sed -ne "s/^\(void test_${suite_name}__[a-zA-Z_0-9][a-zA-Z_0-9]*(void)\)$/extern \1;/p" "$suite" ||
> +	sed -ne "s/^\(void test_${suite_name}__[a-zA-Z_0-9][a-zA-Z_0-9]*(void)\).*/extern \1;/p" "$suite" ||
>  	exit 1
>  done >"$OUTPUT"
> --
> 2.34.1
Junio C Hamano Jan. 7, 2025, 6:41 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> A quick peek at [PATCH 2/2] tells me that this is not even something
> that would make it easier to port the existing tests by allowing
> more straight line-by-line copies or something.  The patch splits
> many in-line test pieces in the "main" into separate functions, and
> it does so in a rather unusual format, e.g.,
>
>   void test_hash__multi_character(void) TEST_HASH_STR("abc",
>           "a9993e364706816aba3e25717850c26c9cd0d89d",
>           "ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad")
>
> where TEST_HASH_STR() expands to the function body that starts with
> a "{" and ends with a "}".  It can well be written more like
>
>     void test_hash__multi_character(void)
>     {
> 	TEST_HASH_STR("abc",
>         	"a9993e364706816aba3e25717850c26c9cd0d89d",
> 		"ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad");
>     }
>
> and we do not need this step at all if we did so.  Such a construct
> would be a lot friendlier to the editors that auto-indent, too.
>
> So, I do not quite see much value in this particular change.

Having said that, if this were more like that you write a series of

    DEF_HASH_TEST(multi_character, "abc", "a9993e...", "ba7816bf...")

and they expand to

    void test_hash__multi_character(void)
    {
	const char *expected[] = {"a9993e...", "ba7816bf..."};
	check_hash_data("abc", strlen("abc"), expected);
    }

then a preparatory step like this patch _might_ be justifiable.  You
may want to avoid having to write too many boilerplate, and a
special rule to find "DEF_HASH_TEST(name, ...)" and it might make
sense to add support to extract the name of the test function being
defined by the macro automatically.

Not that I think such a sequence of DEF_HASH_TEST(), one per line,
is an improvement at all (it also is unfriendly to editors that
auto-indent the same way as your original version).  I just wanted
to say that a change to the pattern to pick up the function name may
be justifiable if it were so.

Thanks.
Patrick Steinhardt Jan. 8, 2025, 6:14 a.m. UTC | #3
On Tue, Jan 07, 2025 at 10:16:33AM -0800, Junio C Hamano wrote:
> Seyi Kuforiji <kuforiji98@gmail.com> writes:
> 
> > The `generate-clar-decls.sh` script extracts signatures of test
> > functions from our unit tests, which will later get used by the clar to
> > automatically wire up tests. The sed command only matches lines that
> > ended immediately after `void)`, causing it to miss declarations with
> > additional content such as comments or annotations.
> >
> > Relax the regular expression by making it match lines with trailing data
> > after the function signature. This ensures that all valid function
> > declarations are captured and formatted as `extern` declarations
> > regardless of their formatting style, improving the robustness of the
> > script when parsing `$suite` files.
> >
> > This will be used in subsequent commits to match and capture the
> > function signature correctly, regardless of any trailing content.
> 
> I am not sure if this is going in the right direction, though.
> 
> Especially for things like test suites that are looked at and worked
> on only by develoeprs *and* these tools, being uniform and consistent
> weighs more than being more flexible.
> 
> Let me state it in another way.  How many of the existing test
> pieces are picked up by the current pattern, and among them how many
> of them would see vast improvements if they are allowed to have
> arbitrary garbage after their "I do not take any arguments" function
> signature?  Are new tests you are migrating from outside the clar
> world lose a lot if they are no longer allowed to have comments
> there, or would it be suffice to have the comments before the
> functions (which many of our function definition do anyway)?
> 
> A quick peek at [PATCH 2/2] tells me that this is not even something
> that would make it easier to port the existing tests by allowing
> more straight line-by-line copies or something.  The patch splits
> many in-line test pieces in the "main" into separate functions, and
> it does so in a rather unusual format, e.g.,
> 
>   void test_hash__multi_character(void) TEST_HASH_STR("abc",
>           "a9993e364706816aba3e25717850c26c9cd0d89d",
>           "ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad")
> 
> where TEST_HASH_STR() expands to the function body that starts with
> a "{" and ends with a "}".  It can well be written more like
> 
>     void test_hash__multi_character(void)
>     {
> 	TEST_HASH_STR("abc",
>         	"a9993e364706816aba3e25717850c26c9cd0d89d",
> 		"ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad");
>     }
> 
> and we do not need this step at all if we did so.  Such a construct
> would be a lot friendlier to the editors that auto-indent, too.
> 
> So, I do not quite see much value in this particular change.

Yeah. This was something I proposed, but I already mentioned to Seyi
that I'm not all that happy with the outcome as it has a couple of
downsides, for example broken syntax highlighting in lots of editors. I
said he can send that version to the mailing list anyway and get some
feedback on it to figure out whether my discomfort with my own idea is
warranted or not. And your comment here basically confirms that my idea
wasn't that great after all :)

So I agree with you, let's scrap the idea and have proper function
bodies instead.

Patrick
Seyi Kuforiji Jan. 8, 2025, 8:31 a.m. UTC | #4
On Wed, 8 Jan 2025 at 07:14, Patrick Steinhardt <ps@pks.im> wrote:
>
> On Tue, Jan 07, 2025 at 10:16:33AM -0800, Junio C Hamano wrote:
> > Seyi Kuforiji <kuforiji98@gmail.com> writes:
> >
> > > The `generate-clar-decls.sh` script extracts signatures of test
> > > functions from our unit tests, which will later get used by the clar to
> > > automatically wire up tests. The sed command only matches lines that
> > > ended immediately after `void)`, causing it to miss declarations with
> > > additional content such as comments or annotations.
> > >
> > > Relax the regular expression by making it match lines with trailing data
> > > after the function signature. This ensures that all valid function
> > > declarations are captured and formatted as `extern` declarations
> > > regardless of their formatting style, improving the robustness of the
> > > script when parsing `$suite` files.
> > >
> > > This will be used in subsequent commits to match and capture the
> > > function signature correctly, regardless of any trailing content.
> >
> > I am not sure if this is going in the right direction, though.
> >
> > Especially for things like test suites that are looked at and worked
> > on only by develoeprs *and* these tools, being uniform and consistent
> > weighs more than being more flexible.
> >
> > Let me state it in another way.  How many of the existing test
> > pieces are picked up by the current pattern, and among them how many
> > of them would see vast improvements if they are allowed to have
> > arbitrary garbage after their "I do not take any arguments" function
> > signature?  Are new tests you are migrating from outside the clar
> > world lose a lot if they are no longer allowed to have comments
> > there, or would it be suffice to have the comments before the
> > functions (which many of our function definition do anyway)?
> >
> > A quick peek at [PATCH 2/2] tells me that this is not even something
> > that would make it easier to port the existing tests by allowing
> > more straight line-by-line copies or something.  The patch splits
> > many in-line test pieces in the "main" into separate functions, and
> > it does so in a rather unusual format, e.g.,
> >
> >   void test_hash__multi_character(void) TEST_HASH_STR("abc",
> >           "a9993e364706816aba3e25717850c26c9cd0d89d",
> >           "ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad")
> >
> > where TEST_HASH_STR() expands to the function body that starts with
> > a "{" and ends with a "}".  It can well be written more like
> >
> >     void test_hash__multi_character(void)
> >     {
> >       TEST_HASH_STR("abc",
> >               "a9993e364706816aba3e25717850c26c9cd0d89d",
> >               "ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad");
> >     }
> >
> > and we do not need this step at all if we did so.  Such a construct
> > would be a lot friendlier to the editors that auto-indent, too.
> >
> > So, I do not quite see much value in this particular change.
>
> Yeah. This was something I proposed, but I already mentioned to Seyi
> that I'm not all that happy with the outcome as it has a couple of
> downsides, for example broken syntax highlighting in lots of editors. I
> said he can send that version to the mailing list anyway and get some
> feedback on it to figure out whether my discomfort with my own idea is
> warranted or not. And your comment here basically confirms that my idea
> wasn't that great after all :)
>
> So I agree with you, let's scrap the idea and have proper function
> bodies instead.
>
> Patrick

Thank you for the feedback and insights. I'll update the patch to use
standard function bodies.

Thanks
Seyi
Junio C Hamano Jan. 8, 2025, 3:27 p.m. UTC | #5
Patrick Steinhardt <ps@pks.im> writes:

> So I agree with you, let's scrap the idea and have proper function
> bodies instead.

Yup, sometimes, simple, stupid, and good enough is the way to go.

We could do

-- >8 --

#define T(testname, input, expect1, expect256) \
	void test_hash__ ## testname(void) \
	{ \
		const char *expect[] = { expect1, expect256 }; \
		check_hash_data(input, strlen(input), expect); \
	} extern void test_hash__ ## testname()

T(empty_string, "", "da39...", "e3b0c4...");
T(single_character, "a", "86f7e4...", "ca97811...");

-- 8< --

which may not upset syntax-aware editors too much.

Unless there are more than several dozens of them, I do not think it
is worth it, though ;-)
Patrick Steinhardt Jan. 8, 2025, 4:15 p.m. UTC | #6
On Wed, Jan 08, 2025 at 07:27:37AM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > So I agree with you, let's scrap the idea and have proper function
> > bodies instead.
> 
> Yup, sometimes, simple, stupid, and good enough is the way to go.
> 
> We could do
> 
> -- >8 --
> 
> #define T(testname, input, expect1, expect256) \
> 	void test_hash__ ## testname(void) \
> 	{ \
> 		const char *expect[] = { expect1, expect256 }; \
> 		check_hash_data(input, strlen(input), expect); \
> 	} extern void test_hash__ ## testname()
> 
> T(empty_string, "", "da39...", "e3b0c4...");
> T(single_character, "a", "86f7e4...", "ca97811...");
> 
> -- 8< --
> 
> which may not upset syntax-aware editors too much.
> 
> Unless there are more than several dozens of them, I do not think it
> is worth it, though ;-)

Agreed, I'd go with the simple solution for now, which is to have
function bodies.

Patrick
diff mbox series

Patch

diff --git a/t/unit-tests/generate-clar-decls.sh b/t/unit-tests/generate-clar-decls.sh
index 3b315c64b3..02e45cf0ba 100755
--- a/t/unit-tests/generate-clar-decls.sh
+++ b/t/unit-tests/generate-clar-decls.sh
@@ -14,6 +14,6 @@  do
 	suite_name=$(basename "$suite")
 	suite_name=${suite_name%.c}
 	suite_name=${suite_name#u-}
-	sed -ne "s/^\(void test_${suite_name}__[a-zA-Z_0-9][a-zA-Z_0-9]*(void)\)$/extern \1;/p" "$suite" ||
+	sed -ne "s/^\(void test_${suite_name}__[a-zA-Z_0-9][a-zA-Z_0-9]*(void)\).*/extern \1;/p" "$suite" ||
 	exit 1
 done >"$OUTPUT"