diff mbox series

[12/12] t0612: add tests to exercise Git/JGit reftable compatibility

Message ID db66dd4155d80b714719e80ff90f64c1d36b97d0.1712235356.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series t: exercise Git/JGit reftable compatibility | expand

Commit Message

Patrick Steinhardt April 4, 2024, 1:25 p.m. UTC
While the reftable format is a recent introduction in Git, JGit already
knows to read and write reftables since 2017. Given the complexity of
the format there is a very real risk of incompatibilities between those
two implementations, which is something that we really want to avoid.

Add some basic tests that verify that reftables written by Git and JGit
can be read by the respective other implementation. For now this test
suite is rather small, only covering basic functionality. But it serves
as a good starting point and can be extended over time.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 t/t0612-reftable-jgit-compatibility.sh | 115 +++++++++++++++++++++++++
 1 file changed, 115 insertions(+)
 create mode 100755 t/t0612-reftable-jgit-compatibility.sh

Comments

Han-Wen Nienhuys April 4, 2024, 9:40 p.m. UTC | #1
On Thu, Apr 4, 2024 at 5:01 PM Patrick Steinhardt <ps@pks.im> wrote:
> +
> +test_same_refs () {
> +       git show-ref --head >cgit.actual &&
> +       jgit show-ref >jgit-tabs.actual &&

This seeks to the start and then iterates to the end, likely skipping
indexes. If you want to test for indexes etc. you should also sample
random refs from the namespace and see if they are returned correctly.

> +test_expect_success 'JGit repository can be read by CGit' '
> +       test_when_finished "rm -rf repo" &&
> +       # JGit does not provide a way to create a reftable-enabled repository.

You can do "jgit init && jgit convert-refstorage --format=reftable"
Patrick Steinhardt April 5, 2024, 9:41 a.m. UTC | #2
On Thu, Apr 04, 2024 at 11:40:29PM +0200, Han-Wen Nienhuys wrote:
> On Thu, Apr 4, 2024 at 5:01 PM Patrick Steinhardt <ps@pks.im> wrote:
> > +
> > +test_same_refs () {
> > +       git show-ref --head >cgit.actual &&
> > +       jgit show-ref >jgit-tabs.actual &&
> 
> This seeks to the start and then iterates to the end, likely skipping
> indexes. If you want to test for indexes etc. you should also sample
> random refs from the namespace and see if they are returned correctly.

True, I'll add that.

> > +test_expect_success 'JGit repository can be read by CGit' '
> > +       test_when_finished "rm -rf repo" &&
> > +       # JGit does not provide a way to create a reftable-enabled repository.
> 
> You can do "jgit init && jgit convert-refstorage --format=reftable"

Perfect, thanks for this hint!

Patrick
Han-Wen Nienhuys April 6, 2024, 1:03 p.m. UTC | #3
On Fri, Apr 5, 2024 at 11:41 AM Patrick Steinhardt <ps@pks.im> wrote:
> > > +test_expect_success 'JGit repository can be read by CGit' '
> > > +       test_when_finished "rm -rf repo" &&
> > > +       # JGit does not provide a way to create a reftable-enabled repository.
> >
> > You can do "jgit init && jgit convert-refstorage --format=reftable"
>
> Perfect, thanks for this hint!

on a tangent: I wrote this a long time ago, and it does no locking,
which probably leads to unpleasant surprises if it is run in parallel
with other ref updates. What would be a foolproof way to stop other
processes from stomping over the repo while the conversion is ongoing?
Maybe create a packed-refs.lock file?
Patrick Steinhardt April 6, 2024, 3:53 p.m. UTC | #4
On Sat, Apr 06, 2024 at 03:03:20PM +0200, Han-Wen Nienhuys wrote:
> On Fri, Apr 5, 2024 at 11:41 AM Patrick Steinhardt <ps@pks.im> wrote:
> > > > +test_expect_success 'JGit repository can be read by CGit' '
> > > > +       test_when_finished "rm -rf repo" &&
> > > > +       # JGit does not provide a way to create a reftable-enabled repository.
> > >
> > > You can do "jgit init && jgit convert-refstorage --format=reftable"
> >
> > Perfect, thanks for this hint!
> 
> on a tangent: I wrote this a long time ago, and it does no locking,
> which probably leads to unpleasant surprises if it is run in parallel
> with other ref updates. What would be a foolproof way to stop other
> processes from stomping over the repo while the conversion is ongoing?
> Maybe create a packed-refs.lock file?

Yes, this is indeed a problem that I have been thinking about quite a
lot recently as I want to introduce migration tooling in the next
release cycle. I was quite happy to learn that you've got something in
JGit and was hoping that you found a way to avoid races. Too bad you
seem to have the same issue as I do right now.

In any case, creating a "packed-refs.lock" is not sufficient. While it
does protect against ref deletion and repacking refs, it doesn't protect
against ref creation and updates. And in fact there is no way to guard
against these at all as all that the "files" backend does is to create
or update a single loose file without taking any central lock.

I was briefly thinking about a "migration" repository extension. Once
the migration starts we add "extensions.ref-migration" to the repo,
which is an extension that no client knows how to handle. Consequently,
while the extension is set, any _new_ Git processes would bail out and
refuse to update the repository.

That only solves part of the problem though. What it doesn't protect
against is already-running Git processes that have already read the
config.

There are of course very hacky ways to get closer to the goal. E.g.:

    1. Repack all refs.

    2. Delete the now-empty "refs/" hierarchy.

    3. Create a file "refs" with invalid contents.

    4. Lock the "packed-refs" file.

    5. Lock the "HEAD" file.

Now you can be sure that there is no way to create or update normal
refs. But it's still incomplete as pseudo-refs can be written to. And it
does feel very fragile overall, especially if you need to roll back the
migration due to whatever reason.

So I dunno, until now I didn't yet find any way to reliably avoid racy
ref updates during a conversion. If you or anybody else got some ideas
I'd be very happy to hear about them.

Patrick
Patrick Steinhardt April 8, 2024, 5:24 a.m. UTC | #5
On Thu, Apr 04, 2024 at 11:40:29PM +0200, Han-Wen Nienhuys wrote:
> On Thu, Apr 4, 2024 at 5:01 PM Patrick Steinhardt <ps@pks.im> wrote:
> > +test_expect_success 'JGit repository can be read by CGit' '
> > +       test_when_finished "rm -rf repo" &&
> > +       # JGit does not provide a way to create a reftable-enabled repository.
> 
> You can do "jgit init && jgit convert-refstorage --format=reftable"

By the way, the above command does not work as-is as it will end up with
a repository that has no "HEAD" reference. I can work around that
though by first writing the default branch and then converting the ref
storage.

Patrick
diff mbox series

Patch

diff --git a/t/t0612-reftable-jgit-compatibility.sh b/t/t0612-reftable-jgit-compatibility.sh
new file mode 100755
index 0000000000..f25ea880a0
--- /dev/null
+++ b/t/t0612-reftable-jgit-compatibility.sh
@@ -0,0 +1,115 @@ 
+#!/bin/sh
+
+test_description='reftables are compatible with JGit'
+
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+GIT_TEST_DEFAULT_REF_FORMAT=reftable
+export GIT_TEST_DEFAULT_REF_FORMAT
+
+# JGit does not support the 'link' DIRC extension.
+GIT_TEST_SPLIT_INDEX=0
+export GIT_TEST_SPLIT_INDEX
+
+. ./test-lib.sh
+
+if ! test_have_prereq JGIT
+then
+	skip_all='skipping reftable JGit tests'
+	test_done
+fi
+
+if ! test_have_prereq SHA1
+then
+	skip_all='skipping reftable JGit tests; JGit does not support SHA256 reftables'
+	test_done
+fi
+
+test_commit_jgit () {
+	touch "$1" &&
+	jgit add "$1" &&
+	jgit commit -m "$1"
+}
+
+test_same_refs () {
+	git show-ref --head >cgit.actual &&
+	jgit show-ref >jgit-tabs.actual &&
+	tr "\t" " " <jgit-tabs.actual >jgit.actual &&
+	test_cmp cgit.actual jgit.actual
+}
+
+test_same_reflog () {
+	git reflog "$*" >cgit.actual &&
+	jgit reflog "$*" >jgit-newline.actual &&
+	sed '/^$/d' <jgit-newline.actual >jgit.actual &&
+	test_cmp cgit.actual jgit.actual
+}
+
+test_expect_success 'CGit repository can be read by JGit' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit A &&
+		test_same_refs &&
+		test_same_reflog HEAD
+	)
+'
+
+test_expect_success 'JGit repository can be read by CGit' '
+	test_when_finished "rm -rf repo" &&
+	# JGit does not provide a way to create a reftable-enabled repository.
+	git init repo &&
+	(
+		cd repo &&
+		touch file &&
+		jgit add file &&
+		jgit commit -m "initial commit" &&
+
+		test_same_refs &&
+		# Interestingly, JGit cannot read its own reflog here. CGit can
+		# though.
+		printf "%s HEAD@{0}: commit (initial): initial commit" "$(git rev-parse --short HEAD)" >expect &&
+		git reflog HEAD >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'mixed writes from JGit and CGit' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		test_commit A &&
+		test_commit_jgit B &&
+		test_commit C &&
+		test_commit_jgit D &&
+
+		test_same_refs &&
+		test_same_reflog HEAD
+	)
+'
+
+test_expect_success 'JGit can read multi-level index' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		test_commit A &&
+		awk "
+		    BEGIN {
+			print \"start\";
+			for (i = 0; i < 10000; i++)
+			    printf \"create refs/heads/branch-%d HEAD\n\", i;
+			print \"commit\";
+		    }
+		" >input &&
+		git update-ref --stdin <input &&
+
+		test_same_refs
+	)
+'
+
+test_done