diff mbox series

[v2,12/12] t0612: add tests to exercise Git/JGit reftable compatibility

Message ID 160b026e69547739a526fb6276a895904a4d33a8.1712555682.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series t: exercise Git/JGit reftable compatibility | expand

Commit Message

Patrick Steinhardt April 8, 2024, 6:47 a.m. UTC
While the reftable format is a recent introduction in Git, JGit already
knows to read and write reftables since 2017. Given the complexity of
the format there is a very real risk of incompatibilities between those
two implementations, which is something that we really want to avoid.

Add some basic tests that verify that reftables written by Git and JGit
can be read by the respective other implementation. For now this test
suite is rather small, only covering basic functionality. But it serves
as a good starting point and can be extended over time.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 t/t0612-reftable-jgit-compatibility.sh | 132 +++++++++++++++++++++++++
 1 file changed, 132 insertions(+)
 create mode 100755 t/t0612-reftable-jgit-compatibility.sh

Comments

Eric Sunshine April 8, 2024, 7:10 a.m. UTC | #1
On Mon, Apr 8, 2024 at 2:47 AM Patrick Steinhardt <ps@pks.im> wrote:
> While the reftable format is a recent introduction in Git, JGit already
> knows to read and write reftables since 2017. Given the complexity of
> the format there is a very real risk of incompatibilities between those
> two implementations, which is something that we really want to avoid.
>
> Add some basic tests that verify that reftables written by Git and JGit
> can be read by the respective other implementation. For now this test
> suite is rather small, only covering basic functionality. But it serves
> as a good starting point and can be extended over time.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> diff --git a/t/t0612-reftable-jgit-compatibility.sh b/t/t0612-reftable-jgit-compatibility.sh
> @@ -0,0 +1,132 @@
> +test_expect_success 'JGit can read multi-level index' '
> +               ...
> +               awk "
> +                   BEGIN {
> +                       print \"start\";
> +                       for (i = 0; i < 10000; i++)
> +                           printf \"create refs/heads/branch-%d HEAD\n\", i;
> +                       print \"commit\";
> +                   }
> +               " >input &&

I was going to suggest that you could accomplish this more easily
directly in shell (without employing `awk`):

    {
        echo start &&
        printf "create refs/heads/branch-%d HEAD\n" $(test_seq 0 9999) &&
        echo commit
    } >input &&

but then I realized that that could potentially run afoul of
command-line length limit on some platform due to the 0-9999 sequence.
Junio C Hamano April 8, 2024, 4:07 p.m. UTC | #2
Eric Sunshine <sunshine@sunshineco.com> writes:

> I was going to suggest that you could accomplish this more easily
> directly in shell (without employing `awk`):
>
>     {
>         echo start &&
>         printf "create refs/heads/branch-%d HEAD\n" $(test_seq 0 9999) &&
>         echo commit
>     } >input &&
>
> but then I realized that that could potentially run afoul of
> command-line length limit on some platform due to the 0-9999 sequence.

As xargs is supposed to know the system limit, perhaps

	test_seq 0 9999 | xargs printf "...%d...\n"

should work?
Eric Sunshine April 8, 2024, 4:19 p.m. UTC | #3
On Mon, Apr 8, 2024 at 12:07 PM Junio C Hamano <gitster@pobox.com> wrote:
> Eric Sunshine <sunshine@sunshineco.com> writes:
> > I was going to suggest that you could accomplish this more easily
> > directly in shell (without employing `awk`):
> >
> >     {
> >         echo start &&
> >         printf "create refs/heads/branch-%d HEAD\n" $(test_seq 0 9999) &&
> >         echo commit
> >     } >input &&
> >
> > but then I realized that that could potentially run afoul of
> > command-line length limit on some platform due to the 0-9999 sequence.
>
> As xargs is supposed to know the system limit, perhaps
>
>         test_seq 0 9999 | xargs printf "...%d...\n"
>
> should work?

Hmm, yes, that should work nicely.

Whether or not such a change is worthwhile is a different matter.
Although it is perhaps simpler and easier to read, Windows folk might
not appreciate it since it spawns at least three processes (and
perhaps more depending upon how test_seq is implemented), whereas the
`awk` approach spawns a single process.
Patrick Steinhardt April 8, 2024, 4:22 p.m. UTC | #4
On Mon, Apr 08, 2024 at 09:07:32AM -0700, Junio C Hamano wrote:
> Eric Sunshine <sunshine@sunshineco.com> writes:
> 
> > I was going to suggest that you could accomplish this more easily
> > directly in shell (without employing `awk`):
> >
> >     {
> >         echo start &&
> >         printf "create refs/heads/branch-%d HEAD\n" $(test_seq 0 9999) &&
> >         echo commit
> >     } >input &&
> >
> > but then I realized that that could potentially run afoul of
> > command-line length limit on some platform due to the 0-9999 sequence.
> 
> As xargs is supposed to know the system limit, perhaps
> 
> 	test_seq 0 9999 | xargs printf "...%d...\n"
> 
> should work?

Is there a reason why we want to avoid using awk(1) in the first place?
We use it in several other tests and I don't think that the resulting
code is all that bad.

Patrick
Eric Sunshine April 8, 2024, 4:29 p.m. UTC | #5
On Mon, Apr 8, 2024 at 12:19 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
> On Mon, Apr 8, 2024 at 12:07 PM Junio C Hamano <gitster@pobox.com> wrote:
> > Eric Sunshine <sunshine@sunshineco.com> writes:
> > > I was going to suggest that you could accomplish this more easily
> > > directly in shell (without employing `awk`):
> > >
> > >     {
> > >         echo start &&
> > >         printf "create refs/heads/branch-%d HEAD\n" $(test_seq 0 9999) &&
> > >         echo commit
> > >     } >input &&
> > >
> > > but then I realized that that could potentially run afoul of
> > > command-line length limit on some platform due to the 0-9999 sequence.
> >
> > As xargs is supposed to know the system limit, perhaps
> >
> >         test_seq 0 9999 | xargs printf "...%d...\n"
> >
> > should work?
>
> Hmm, yes, that should work nicely.
>
> Whether or not such a change is worthwhile is a different matter.
> Although it is perhaps simpler and easier to read, Windows folk might
> not appreciate it since it spawns at least three processes (and
> perhaps more depending upon how test_seq is implemented), whereas the
> `awk` approach spawns a single process.

And, to be clear, I have no objection to this patch's use of `awk`. It
is "good enough" as is; I was not asking Patrick to make the change,
but rather made the comment in case the idea hadn't occurred to him.
Jeff King April 8, 2024, 5:26 p.m. UTC | #6
On Mon, Apr 08, 2024 at 06:22:10PM +0200, Patrick Steinhardt wrote:

> On Mon, Apr 08, 2024 at 09:07:32AM -0700, Junio C Hamano wrote:
> > Eric Sunshine <sunshine@sunshineco.com> writes:
> > 
> > > I was going to suggest that you could accomplish this more easily
> > > directly in shell (without employing `awk`):
> > >
> > >     {
> > >         echo start &&
> > >         printf "create refs/heads/branch-%d HEAD\n" $(test_seq 0 9999) &&
> > >         echo commit
> > >     } >input &&
> > >
> > > but then I realized that that could potentially run afoul of
> > > command-line length limit on some platform due to the 0-9999 sequence.
> > 
> > As xargs is supposed to know the system limit, perhaps
> > 
> > 	test_seq 0 9999 | xargs printf "...%d...\n"
> > 
> > should work?
> 
> Is there a reason why we want to avoid using awk(1) in the first place?
> We use it in several other tests and I don't think that the resulting
> code is all that bad.

Using awk is fine, but I think in general if we can do something just as
easily without an extra process, that is preferable. Using xargs here
does not to me count as "just as easily". However, using a shell loop
might actually be more readable because you'd avoid the extra quoting.
I.e. either:

  for i in $(test_seq 10000)
  do
	echo "create refs/heads/branch-$i HEAD"
  done

or:

  i=0
  while test $i -lt 10000
  do
	echo "create refs/heads/branch-$i HEAD"
	i=$((i+1))
  done

I think the first one actually incurs an extra process anyway because of
the $(). The second one would not. Of course, the second one probably
needs &&-chaining and a "|| return 1" to work in our test snippet.

IMHO the nicest thing would be if our test_seq could take a format
parameter, like:

diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 2eccf100c0..c8f32eb409 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1404,6 +1404,13 @@ test_cmp_fspath () {
 # from 1.
 
 test_seq () {
+	local fmt="%d"
+	case "$1" in
+	-f)
+		fmt="$2"
+		shift 2
+		;;
+	esac
 	case $# in
 	1)	set 1 "$@" ;;
 	2)	;;
@@ -1412,7 +1419,7 @@ test_seq () {
 	test_seq_counter__=$1
 	while test "$test_seq_counter__" -le "$2"
 	do
-		echo "$test_seq_counter__"
+		printf "$fmt\n" "$test_seq_counter__"
 		test_seq_counter__=$(( $test_seq_counter__ + 1 ))
 	done
 }

Then you could just write:

  test_seq -f "create refs/heads/branch-%d HEAD" 0 9999

and I suspect there are several other spots which could be simplified as
well.

Anyway, I don't think it is worth derailing your series for this, but
just general food for thought.

-Peff
Junio C Hamano April 8, 2024, 9:51 p.m. UTC | #7
Patrick Steinhardt <ps@pks.im> writes:

>> As xargs is supposed to know the system limit, perhaps
>> 
>> 	test_seq 0 9999 | xargs printf "...%d...\n"
>> 
>> should work?
>
> Is there a reason why we want to avoid using awk(1) in the first place?
> We use it in several other tests and I don't think that the resulting
> code is all that bad.

There is no reason.  It was a canned reaction against "gee we will
bust shell's limit" to "use xargs then".

Of course, if we can do the loop in the shell and everything we do
in the loop with shell builtins (printf is often builtin in modern
shells but not always, if I recall correctly), that would be best,
and if the job is _that_ simple that we could do in the shell, it
would make "awk" a horrible choice ;-)
Junio C Hamano April 8, 2024, 9:52 p.m. UTC | #8
Jeff King <peff@peff.net> writes:

> I.e. either:
>
>   for i in $(test_seq 10000)
>   do
> 	echo "create refs/heads/branch-$i HEAD"
>   done
>
> or:
>
>   i=0
>   while test $i -lt 10000
>   do
> 	echo "create refs/heads/branch-$i HEAD"
> 	i=$((i+1))
>   done
>
> I think the first one actually incurs an extra process anyway because of
> the $(). The second one would not. Of course, the second one probably
> needs &&-chaining and a "|| return 1" to work in our test snippet.
> ...
> Then you could just write:
>
>   test_seq -f "create refs/heads/branch-%d HEAD" 0 9999
>
> and I suspect there are several other spots which could be simplified as
> well.
>
> Anyway, I don't think it is worth derailing your series for this, but
> just general food for thought.

Yup, I agree with everything you said.

Thanks.
Justin Tobler April 10, 2024, 8:43 p.m. UTC | #9
On 24/04/08 08:47AM, Patrick Steinhardt wrote:
> While the reftable format is a recent introduction in Git, JGit already
> knows to read and write reftables since 2017. Given the complexity of
> the format there is a very real risk of incompatibilities between those
> two implementations, which is something that we really want to avoid.
> 
> Add some basic tests that verify that reftables written by Git and JGit
> can be read by the respective other implementation. For now this test
> suite is rather small, only covering basic functionality. But it serves
> as a good starting point and can be extended over time.
> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  t/t0612-reftable-jgit-compatibility.sh | 132 +++++++++++++++++++++++++
>  1 file changed, 132 insertions(+)
>  create mode 100755 t/t0612-reftable-jgit-compatibility.sh
> 
> diff --git a/t/t0612-reftable-jgit-compatibility.sh b/t/t0612-reftable-jgit-compatibility.sh
> new file mode 100755
> index 0000000000..222464e360
> --- /dev/null
> +++ b/t/t0612-reftable-jgit-compatibility.sh
> @@ -0,0 +1,132 @@
> +#!/bin/sh
> +
> +test_description='reftables are compatible with JGit'
> +
> +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
> +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
> +GIT_TEST_DEFAULT_REF_FORMAT=reftable
> +export GIT_TEST_DEFAULT_REF_FORMAT
> +
> +# JGit does not support the 'link' DIRC extension.
> +GIT_TEST_SPLIT_INDEX=0
> +export GIT_TEST_SPLIT_INDEX
> +
> +. ./test-lib.sh
> +
> +if ! test_have_prereq JGIT

Do we want these tests to run in CI? As far as I can tell these tests
would always be skipped.

-Justin

> +then
> +	skip_all='skipping reftable JGit tests'
> +	test_done
> +fi
...
Patrick Steinhardt April 11, 2024, 8:55 a.m. UTC | #10
On Wed, Apr 10, 2024 at 03:43:58PM -0500, Justin Tobler wrote:
> On 24/04/08 08:47AM, Patrick Steinhardt wrote:
> > While the reftable format is a recent introduction in Git, JGit already
> > knows to read and write reftables since 2017. Given the complexity of
> > the format there is a very real risk of incompatibilities between those
> > two implementations, which is something that we really want to avoid.
> > 
> > Add some basic tests that verify that reftables written by Git and JGit
> > can be read by the respective other implementation. For now this test
> > suite is rather small, only covering basic functionality. But it serves
> > as a good starting point and can be extended over time.
> > 
> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> > ---
> >  t/t0612-reftable-jgit-compatibility.sh | 132 +++++++++++++++++++++++++
> >  1 file changed, 132 insertions(+)
> >  create mode 100755 t/t0612-reftable-jgit-compatibility.sh
> > 
> > diff --git a/t/t0612-reftable-jgit-compatibility.sh b/t/t0612-reftable-jgit-compatibility.sh
> > new file mode 100755
> > index 0000000000..222464e360
> > --- /dev/null
> > +++ b/t/t0612-reftable-jgit-compatibility.sh
> > @@ -0,0 +1,132 @@
> > +#!/bin/sh
> > +
> > +test_description='reftables are compatible with JGit'
> > +
> > +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
> > +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
> > +GIT_TEST_DEFAULT_REF_FORMAT=reftable
> > +export GIT_TEST_DEFAULT_REF_FORMAT
> > +
> > +# JGit does not support the 'link' DIRC extension.
> > +GIT_TEST_SPLIT_INDEX=0
> > +export GIT_TEST_SPLIT_INDEX
> > +
> > +. ./test-lib.sh
> > +
> > +if ! test_have_prereq JGIT
> 
> Do we want these tests to run in CI? As far as I can tell these tests
> would always be skipped.

They aren't skipped on Linux-based jobs at least. The JGIT prerequisite
is defined in "t/test-lib.sh" like so:

```
test_lazy_prereq JGIT '
	jgit --version
'
```

And because we have adapted "install-dependencies.sh" to install the
"jgit" executable into our custom PATH directory it is available,
meaning that the prereq is fulfilled.

But you're actually onto something: while the tests run on GitHub, they
don't run on GitLab. And this is caused by the unprivileged build that
we use on GitLab, where we install dependencies as a different user than
what we run tests with. Consequently, as the custom path is derived from
HOME, and HOME changes, we cannot find these binaries.

This is not a new issue, it's been there since the inception of the
GitLab pipelines. I'll include another patch on top to fix this.

Patrick
diff mbox series

Patch

diff --git a/t/t0612-reftable-jgit-compatibility.sh b/t/t0612-reftable-jgit-compatibility.sh
new file mode 100755
index 0000000000..222464e360
--- /dev/null
+++ b/t/t0612-reftable-jgit-compatibility.sh
@@ -0,0 +1,132 @@ 
+#!/bin/sh
+
+test_description='reftables are compatible with JGit'
+
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+GIT_TEST_DEFAULT_REF_FORMAT=reftable
+export GIT_TEST_DEFAULT_REF_FORMAT
+
+# JGit does not support the 'link' DIRC extension.
+GIT_TEST_SPLIT_INDEX=0
+export GIT_TEST_SPLIT_INDEX
+
+. ./test-lib.sh
+
+if ! test_have_prereq JGIT
+then
+	skip_all='skipping reftable JGit tests'
+	test_done
+fi
+
+if ! test_have_prereq SHA1
+then
+	skip_all='skipping reftable JGit tests; JGit does not support SHA256 reftables'
+	test_done
+fi
+
+test_commit_jgit () {
+	touch "$1" &&
+	jgit add "$1" &&
+	jgit commit -m "$1"
+}
+
+test_same_refs () {
+	git show-ref --head >cgit.actual &&
+	jgit show-ref >jgit-tabs.actual &&
+	tr "\t" " " <jgit-tabs.actual >jgit.actual &&
+	test_cmp cgit.actual jgit.actual
+}
+
+test_same_ref () {
+	git rev-parse "$1" >cgit.actual &&
+	jgit rev-parse "$1" >jgit.actual &&
+	test_cmp cgit.actual jgit.actual
+}
+
+test_same_reflog () {
+	git reflog "$*" >cgit.actual &&
+	jgit reflog "$*" >jgit-newline.actual &&
+	sed '/^$/d' <jgit-newline.actual >jgit.actual &&
+	test_cmp cgit.actual jgit.actual
+}
+
+test_expect_success 'CGit repository can be read by JGit' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit A &&
+		test_same_refs &&
+		test_same_ref HEAD &&
+		test_same_reflog HEAD
+	)
+'
+
+test_expect_success 'JGit repository can be read by CGit' '
+	test_when_finished "rm -rf repo" &&
+	jgit init repo &&
+	(
+		cd repo &&
+
+		touch file &&
+		jgit add file &&
+		jgit commit -m "initial commit" &&
+
+		# Note that we must convert the ref storage after we have
+		# written the default branch. Otherwise JGit will end up with
+		# no HEAD at all.
+		jgit convert-ref-storage --format=reftable &&
+
+		test_same_refs &&
+		test_same_ref HEAD &&
+		# Interestingly, JGit cannot read its own reflog here. CGit can
+		# though.
+		printf "%s HEAD@{0}: commit (initial): initial commit" "$(git rev-parse --short HEAD)" >expect &&
+		git reflog HEAD >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'mixed writes from JGit and CGit' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		test_commit A &&
+		test_commit_jgit B &&
+		test_commit C &&
+		test_commit_jgit D &&
+
+		test_same_refs &&
+		test_same_ref HEAD &&
+		test_same_reflog HEAD
+	)
+'
+
+test_expect_success 'JGit can read multi-level index' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		test_commit A &&
+		awk "
+		    BEGIN {
+			print \"start\";
+			for (i = 0; i < 10000; i++)
+			    printf \"create refs/heads/branch-%d HEAD\n\", i;
+			print \"commit\";
+		    }
+		" >input &&
+		git update-ref --stdin <input &&
+
+		test_same_refs &&
+		test_same_ref refs/heads/branch-1 &&
+		test_same_ref refs/heads/branch-5738 &&
+		test_same_ref refs/heads/branch-9999
+	)
+'
+
+test_done