[v2,1/3] p5400: add perf tests for git-receive-pack(1)

Message ID	2f6c4e3d102e71104d7d00c78631b149b880609a.1624858240.git.ps@pks.im (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> Date: Mon, 28 Jun 2021 07:33:07 +0200 From: Patrick Steinhardt <ps@pks.im> To: git@vger.kernel.org Cc: Jeff King <peff@peff.net>, Felipe Contreras <felipe.contreras@gmail.com>, SZEDER =?iso-8859-1?q?G=E1bor?= <szeder.dev@gmail.com>, Chris Torek <chris.torek@gmail.com>, =?iso-8859-1?q?=C6var_Arnfj=F6r=F0?= Bjarmason <avarab@gmail.com>, Junio C Hamano <gitster@pobox.com> Subject: [PATCH v2 1/3] p5400: add perf tests for git-receive-pack(1) Message-ID: <2f6c4e3d102e71104d7d00c78631b149b880609a.1624858240.git.ps@pks.im> References: <cover.1624858240.git.ps@pks.im> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="0KrmYKynENU2bVfC" Content-Disposition: inline In-Reply-To: <cover.1624858240.git.ps@pks.im> Precedence: bulk
Series	Speed up connectivity checks via bitmaps \| expand [v2,0/3] Speed up connectivity checks via bitmaps [v2,1/3] p5400: add perf tests for git-receive-pack(1) [v2,2/3] receive-pack: skip connectivity checks on delete-only commands [v2,3/3] connected: implement connectivity check using bitmaps

Message ID

2f6c4e3d102e71104d7d00c78631b149b880609a.1624858240.git.ps@pks.im (mailing list archive)

State

New, archived

Headers

Date: Mon, 28 Jun 2021 07:33:07 +0200
From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Cc: Jeff King <peff@peff.net>, Felipe Contreras <felipe.contreras@gmail.com>,
 SZEDER =?iso-8859-1?q?G=E1bor?= <szeder.dev@gmail.com>,
 Chris Torek <chris.torek@gmail.com>,
 =?iso-8859-1?q?=C6var_Arnfj=F6r=F0?= Bjarmason <avarab@gmail.com>,
 Junio C Hamano <gitster@pobox.com>
Subject: [PATCH v2 1/3] p5400: add perf tests for git-receive-pack(1)
Message-ID: 
 <2f6c4e3d102e71104d7d00c78631b149b880609a.1624858240.git.ps@pks.im>
References: <cover.1624858240.git.ps@pks.im>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha512;
        protocol="application/pgp-signature"; boundary="0KrmYKynENU2bVfC"
Content-Disposition: inline
In-Reply-To: <cover.1624858240.git.ps@pks.im>
Precedence: bulk

Series

Speed up connectivity checks via bitmaps | expand

Commit Message

Patrick Steinhardt June 28, 2021, 5:33 a.m. UTC

We'll the connectivity check logic for git-receive-pack(1) in the
following commits to make it perform better. As a preparatory step, add
some benchmarks such that we can measure these changes.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 t/perf/p5400-receive-pack.sh | 97 ++++++++++++++++++++++++++++++++++++
 1 file changed, 97 insertions(+)
 create mode 100755 t/perf/p5400-receive-pack.sh

Comments

Ævar Arnfjörð Bjarmason June 28, 2021, 7:49 a.m. UTC | #1

On Mon, Jun 28 2021, Patrick Steinhardt wrote:

> [[PGP Signed Part:Undecided]]
> We'll the connectivity check logic for git-receive-pack(1) in the
> following commits to make it perform better. As a preparatory step, add
> some benchmarks such that we can measure these changes.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  t/perf/p5400-receive-pack.sh | 97 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 97 insertions(+)
>  create mode 100755 t/perf/p5400-receive-pack.sh
>
> diff --git a/t/perf/p5400-receive-pack.sh b/t/perf/p5400-receive-pack.sh
> new file mode 100755
> index 0000000000..a945e014a3
> --- /dev/null
> +++ b/t/perf/p5400-receive-pack.sh
> @@ -0,0 +1,97 @@
> +#!/bin/sh
> +
> +test_description="Tests performance of receive-pack"
> +
> +. ./perf-lib.sh
> +
> +test_perf_large_repo

From the runtime I think this just needs test_perf_default_repo, no?
I.e. we should only have *_large_* for cases where git.git is too small
to produce meaningful results.

Part of th problem is that git.git has become larger over time...

> +test_expect_success 'setup' '
> +	# Create a main branch such that we do not have to rely on any specific
> +	# branch to exist in the perf repository.
> +	git switch --force-create main &&
> +
> +	# Set up a pre-receive hook such that no refs will ever be changed.
> +	# This easily allows multiple perf runs, but still exercises
> +	# server-side reference negotiation and checking for consistency.
> +	mkdir hooks &&
> +	write_script hooks/pre-receive <<-EOF &&
> +		#!/bin/sh

You don't need the #!/bin/sh here, and it won't be used. write_script()
adds it (or the wanted shell path).

> +		echo "failed in pre-receive hook"
> +		exit 1
> +	EOF
> +	cat >config <<-EOF &&
> +		[core]
> +			hooksPath=$(pwd)/hooks
> +	EOF

Easier understood IMO as:

    git config -f config core.hooksPath ...

> +	GIT_CONFIG_GLOBAL="$(pwd)/config" &&
> +	export GIT_CONFIG_GLOBAL &&
> +
> +	git switch --create updated &&
> +	test_commit --no-tag updated
> +'
> +
> +setup_empty() {
> +	git init --bare "$2"
> +}

I searched ahead for setup_empty, looked unused, but...

> +setup_clone() {
> +	git clone --bare --no-local --branch main "$1" "$2"
> +}
> +
> +setup_clone_bitmap() {
> +	git clone --bare --no-local --branch main "$1" "$2" &&
> +	git -C "$2" repack -Adb
> +}
> +
> +# Create a reference for each commit in the target repository with extra-refs.
> +# While this may be an atypical setup, biggish repositories easily end up with
> +# hundreds of thousands of refs, and this is a good enough approximation.
> +setup_extrarefs() {
> +	git clone --bare --no-local --branch main "$1" "$2" &&
> +	git -C "$2" log --all --format="tformat:create refs/commit/%h %H" |
> +		git -C "$2" update-ref --stdin
> +}
> +
> +# Create a reference for each commit in the target repository with extra-refs.
> +# While this may be an atypical setup, biggish repositories easily end up with
> +# hundreds of thousands of refs, and this is a good enough approximation.
> +setup_extrarefs_bitmap() {
> +	git clone --bare --no-local --branch main "$1" "$2" &&
> +	git -C "$2" log --all --format="tformat:create refs/commit/%h %H" |
> +		git -C "$2" update-ref --stdin &&
> +	git -C "$2" repack -Adb
> +}
> +
> +for repo in empty clone clone_bitmap extrarefs extrarefs_bitmap
> +do
> +	test_expect_success "$repo setup" '

> +		rm -rf target.git &&
> +		setup_$repo "$(pwd)" target.git

...here we use it via interpolation.

I'd find this whole pattern much easier to understand if the setups were
just a bunch of test_expect_success that created a repo_empty.git,
repo_extrarefs.git etc. Then this loop would be:

    for repo in repo*.git ...

I'd think that would also give you more meaningful perf data, as now the
OS will churn between the clone & the subsequent push tests, better to
do all the setup, then all the different perf tests.

Perhaps there's also a way to re-use this setup across different runs, I
don't know/can't remember if t/perf has a less transient thing than the
normal trash directory to use for that.

Patrick Steinhardt June 29, 2021, 6:18 a.m. UTC | #2

On Mon, Jun 28, 2021 at 09:49:54AM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Jun 28 2021, Patrick Steinhardt wrote:
> 
> > [[PGP Signed Part:Undecided]]
> > We'll the connectivity check logic for git-receive-pack(1) in the
> > following commits to make it perform better. As a preparatory step, add
> > some benchmarks such that we can measure these changes.
> >
> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> > ---
> >  t/perf/p5400-receive-pack.sh | 97 ++++++++++++++++++++++++++++++++++++
> >  1 file changed, 97 insertions(+)
> >  create mode 100755 t/perf/p5400-receive-pack.sh
> >
> > diff --git a/t/perf/p5400-receive-pack.sh b/t/perf/p5400-receive-pack.sh
> > new file mode 100755
> > index 0000000000..a945e014a3
> > --- /dev/null
> > +++ b/t/perf/p5400-receive-pack.sh
> > @@ -0,0 +1,97 @@
> > +#!/bin/sh
> > +
> > +test_description="Tests performance of receive-pack"
> > +
> > +. ./perf-lib.sh
> > +
> > +test_perf_large_repo
> 
> From the runtime I think this just needs test_perf_default_repo, no?
> I.e. we should only have *_large_* for cases where git.git is too small
> to produce meaningful results.
> 
> Part of th problem is that git.git has become larger over time...

I did these tests for 3/3 with git.git first, and results were
significantly different. The performance issues I'm trying to fix with
the connectivity check really only start to show up with largish
repositories.

> > +test_expect_success 'setup' '
> > +	# Create a main branch such that we do not have to rely on any specific
> > +	# branch to exist in the perf repository.
> > +	git switch --force-create main &&
> > +
> > +	# Set up a pre-receive hook such that no refs will ever be changed.
> > +	# This easily allows multiple perf runs, but still exercises
> > +	# server-side reference negotiation and checking for consistency.
> > +	mkdir hooks &&
> > +	write_script hooks/pre-receive <<-EOF &&
> > +		#!/bin/sh
> 
> You don't need the #!/bin/sh here, and it won't be used. write_script()
> adds it (or the wanted shell path).

Makes sense.

> > +		echo "failed in pre-receive hook"
> > +		exit 1
> > +	EOF
> > +	cat >config <<-EOF &&
> > +		[core]
> > +			hooksPath=$(pwd)/hooks
> > +	EOF
> 
> Easier understood IMO as:
> 
>     git config -f config core.hooksPath ...

Yup, will change.

> > +	GIT_CONFIG_GLOBAL="$(pwd)/config" &&
> > +	export GIT_CONFIG_GLOBAL &&
> > +
> > +	git switch --create updated &&
> > +	test_commit --no-tag updated
> > +'
> > +
> > +setup_empty() {
> > +	git init --bare "$2"
> > +}
> 
> I searched ahead for setup_empty, looked unused, but...
> 
> > +setup_clone() {
> > +	git clone --bare --no-local --branch main "$1" "$2"
> > +}
> > +
> > +setup_clone_bitmap() {
> > +	git clone --bare --no-local --branch main "$1" "$2" &&
> > +	git -C "$2" repack -Adb
> > +}
> > +
> > +# Create a reference for each commit in the target repository with extra-refs.
> > +# While this may be an atypical setup, biggish repositories easily end up with
> > +# hundreds of thousands of refs, and this is a good enough approximation.
> > +setup_extrarefs() {
> > +	git clone --bare --no-local --branch main "$1" "$2" &&
> > +	git -C "$2" log --all --format="tformat:create refs/commit/%h %H" |
> > +		git -C "$2" update-ref --stdin
> > +}
> > +
> > +# Create a reference for each commit in the target repository with extra-refs.
> > +# While this may be an atypical setup, biggish repositories easily end up with
> > +# hundreds of thousands of refs, and this is a good enough approximation.
> > +setup_extrarefs_bitmap() {
> > +	git clone --bare --no-local --branch main "$1" "$2" &&
> > +	git -C "$2" log --all --format="tformat:create refs/commit/%h %H" |
> > +		git -C "$2" update-ref --stdin &&
> > +	git -C "$2" repack -Adb
> > +}
> > +
> > +for repo in empty clone clone_bitmap extrarefs extrarefs_bitmap
> > +do
> > +	test_expect_success "$repo setup" '
> 
> > +		rm -rf target.git &&
> > +		setup_$repo "$(pwd)" target.git
> 
> ...here we use it via interpolation.
> 
> I'd find this whole pattern much easier to understand if the setups were
> just a bunch of test_expect_success that created a repo_empty.git,
> repo_extrarefs.git etc. Then this loop would be:
> 
>     for repo in repo*.git ...
> 
> I'd think that would also give you more meaningful perf data, as now the
> OS will churn between the clone & the subsequent push tests, better to
> do all the setup, then all the different perf tests.
> 
> Perhaps there's also a way to re-use this setup across different runs, I
> don't know/can't remember if t/perf has a less transient thing than the
> normal trash directory to use for that.

I originally had code like this, but the issue with first creating all
the repos is that it requires lots of disk space with large repos given
that we'll clone it once per setup. Combined with the fact that I
often run tests in tmpfs, this led to out-of-memory situations quite
fast given that I had 3x6GB repositories plus the seeded packfiles in
RAM.

This is why I've changed the setup to do the setup as we go, to bring
disk usage down to something sane.

Patrick

Ævar Arnfjörð Bjarmason June 29, 2021, 12:09 p.m. UTC | #3

On Tue, Jun 29 2021, Patrick Steinhardt wrote:

> [[PGP Signed Part:Undecided]]
> On Mon, Jun 28, 2021 at 09:49:54AM +0200, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Mon, Jun 28 2021, Patrick Steinhardt wrote:
>> 
>> > [[PGP Signed Part:Undecided]]
>> > We'll the connectivity check logic for git-receive-pack(1) in the
>> > following commits to make it perform better. As a preparatory step, add
>> > some benchmarks such that we can measure these changes.
>> >
>> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
>> > ---
>> >  t/perf/p5400-receive-pack.sh | 97 ++++++++++++++++++++++++++++++++++++
>> >  1 file changed, 97 insertions(+)
>> >  create mode 100755 t/perf/p5400-receive-pack.sh
>> >
>> > diff --git a/t/perf/p5400-receive-pack.sh b/t/perf/p5400-receive-pack.sh
>> > new file mode 100755
>> > index 0000000000..a945e014a3
>> > --- /dev/null
>> > +++ b/t/perf/p5400-receive-pack.sh
>> > @@ -0,0 +1,97 @@
>> > +#!/bin/sh
>> > +
>> > +test_description="Tests performance of receive-pack"
>> > +
>> > +. ./perf-lib.sh
>> > +
>> > +test_perf_large_repo
>> 
>> From the runtime I think this just needs test_perf_default_repo, no?
>> I.e. we should only have *_large_* for cases where git.git is too small
>> to produce meaningful results.
>> 
>> Part of th problem is that git.git has become larger over time...
>
> I did these tests for 3/3 with git.git first, and results were
> significantly different. The performance issues I'm trying to fix with
> the connectivity check really only start to show up with largish
> repositories.
>
>> > +test_expect_success 'setup' '
>> > +	# Create a main branch such that we do not have to rely on any specific
>> > +	# branch to exist in the perf repository.
>> > +	git switch --force-create main &&
>> > +
>> > +	# Set up a pre-receive hook such that no refs will ever be changed.
>> > +	# This easily allows multiple perf runs, but still exercises
>> > +	# server-side reference negotiation and checking for consistency.
>> > +	mkdir hooks &&
>> > +	write_script hooks/pre-receive <<-EOF &&
>> > +		#!/bin/sh
>> 
>> You don't need the #!/bin/sh here, and it won't be used. write_script()
>> adds it (or the wanted shell path).
>
> Makes sense.
>
>> > +		echo "failed in pre-receive hook"
>> > +		exit 1
>> > +	EOF
>> > +	cat >config <<-EOF &&
>> > +		[core]
>> > +			hooksPath=$(pwd)/hooks
>> > +	EOF
>> 
>> Easier understood IMO as:
>> 
>>     git config -f config core.hooksPath ...
>
> Yup, will change.
>
>> > +	GIT_CONFIG_GLOBAL="$(pwd)/config" &&
>> > +	export GIT_CONFIG_GLOBAL &&
>> > +
>> > +	git switch --create updated &&
>> > +	test_commit --no-tag updated
>> > +'
>> > +
>> > +setup_empty() {
>> > +	git init --bare "$2"
>> > +}
>> 
>> I searched ahead for setup_empty, looked unused, but...
>> 
>> > +setup_clone() {
>> > +	git clone --bare --no-local --branch main "$1" "$2"
>> > +}
>> > +
>> > +setup_clone_bitmap() {
>> > +	git clone --bare --no-local --branch main "$1" "$2" &&
>> > +	git -C "$2" repack -Adb
>> > +}
>> > +
>> > +# Create a reference for each commit in the target repository with extra-refs.
>> > +# While this may be an atypical setup, biggish repositories easily end up with
>> > +# hundreds of thousands of refs, and this is a good enough approximation.
>> > +setup_extrarefs() {
>> > +	git clone --bare --no-local --branch main "$1" "$2" &&
>> > +	git -C "$2" log --all --format="tformat:create refs/commit/%h %H" |
>> > +		git -C "$2" update-ref --stdin
>> > +}
>> > +
>> > +# Create a reference for each commit in the target repository with extra-refs.
>> > +# While this may be an atypical setup, biggish repositories easily end up with
>> > +# hundreds of thousands of refs, and this is a good enough approximation.
>> > +setup_extrarefs_bitmap() {
>> > +	git clone --bare --no-local --branch main "$1" "$2" &&
>> > +	git -C "$2" log --all --format="tformat:create refs/commit/%h %H" |
>> > +		git -C "$2" update-ref --stdin &&
>> > +	git -C "$2" repack -Adb
>> > +}
>> > +
>> > +for repo in empty clone clone_bitmap extrarefs extrarefs_bitmap
>> > +do
>> > +	test_expect_success "$repo setup" '
>> 
>> > +		rm -rf target.git &&
>> > +		setup_$repo "$(pwd)" target.git
>> 
>> ...here we use it via interpolation.
>> 
>> I'd find this whole pattern much easier to understand if the setups were
>> just a bunch of test_expect_success that created a repo_empty.git,
>> repo_extrarefs.git etc. Then this loop would be:
>> 
>>     for repo in repo*.git ...
>> 
>> I'd think that would also give you more meaningful perf data, as now the
>> OS will churn between the clone & the subsequent push tests, better to
>> do all the setup, then all the different perf tests.
>> 
>> Perhaps there's also a way to re-use this setup across different runs, I
>> don't know/can't remember if t/perf has a less transient thing than the
>> normal trash directory to use for that.
>
> I originally had code like this, but the issue with first creating all
> the repos is that it requires lots of disk space with large repos given
> that we'll clone it once per setup. Combined with the fact that I
> often run tests in tmpfs, this led to out-of-memory situations quite
> fast given that I had 3x6GB repositories plus the seeded packfiles in
> RAM.
>
> This is why I've changed the setup to do the setup as we go, to bring
> disk usage down to something sane.
>
> Patrick
>
> [[End of PGP Signed Part]]

Ah, I see. In that case wouldn't it be even better/faster with/without
my suggestion to not use "clone" here, which would either be manually
set up with alternates, or removing the --no-local flag.

You'd then share bulk of the object database, and just have different
references. B.t.w. you'll probably get less noise/more relevant results
if you then do a "pack-refs" after creating those N references.

So you should have:

 0. Your "big" test repop (not used directly)
 1. An empty repo
 2. The "big" test repo itself, but just the HEAD branch, using
    alternates to point to #0
 3. Ditto, but we create a crapload of refs for each commit for a
    version of #2.
 4. Ditto #3 (could even "cp" over the packed refs file to save time),
    but add a bitmap on top.

Well, presumably for #4 we'd actually want to do the "git repack -Adb"
for #2 (or enforce that #0 must have it), then just move the *.bitmap
file(s) to #4. Now the test case conflates whether we have bitmaps with
how well (re)packed something is.

I think this might also allow you to get rid of the pre-receive hook for
a "real" push test, since the side-repos would be so cheap at this point
that you could perhaps setup N of them to push into.

diff --git a/t/perf/p5400-receive-pack.sh b/t/perf/p5400-receive-pack.sh
new file mode 100755
index 0000000000..a945e014a3
--- /dev/null
+++ b/t/perf/p5400-receive-pack.sh
@@ -0,0 +1,97 @@ 
+#!/bin/sh
+
+test_description="Tests performance of receive-pack"
+
+. ./perf-lib.sh
+
+test_perf_large_repo
+
+test_expect_success 'setup' '
+	# Create a main branch such that we do not have to rely on any specific
+	# branch to exist in the perf repository.
+	git switch --force-create main &&
+
+	# Set up a pre-receive hook such that no refs will ever be changed.
+	# This easily allows multiple perf runs, but still exercises
+	# server-side reference negotiation and checking for consistency.
+	mkdir hooks &&
+	write_script hooks/pre-receive <<-EOF &&
+		#!/bin/sh
+		echo "failed in pre-receive hook"
+		exit 1
+	EOF
+	cat >config <<-EOF &&
+		[core]
+			hooksPath=$(pwd)/hooks
+	EOF
+	GIT_CONFIG_GLOBAL="$(pwd)/config" &&
+	export GIT_CONFIG_GLOBAL &&
+
+	git switch --create updated &&
+	test_commit --no-tag updated
+'
+
+setup_empty() {
+	git init --bare "$2"
+}
+
+setup_clone() {
+	git clone --bare --no-local --branch main "$1" "$2"
+}
+
+setup_clone_bitmap() {
+	git clone --bare --no-local --branch main "$1" "$2" &&
+	git -C "$2" repack -Adb
+}
+
+# Create a reference for each commit in the target repository with extra-refs.
+# While this may be an atypical setup, biggish repositories easily end up with
+# hundreds of thousands of refs, and this is a good enough approximation.
+setup_extrarefs() {
+	git clone --bare --no-local --branch main "$1" "$2" &&
+	git -C "$2" log --all --format="tformat:create refs/commit/%h %H" |
+		git -C "$2" update-ref --stdin
+}
+
+# Create a reference for each commit in the target repository with extra-refs.
+# While this may be an atypical setup, biggish repositories easily end up with
+# hundreds of thousands of refs, and this is a good enough approximation.
+setup_extrarefs_bitmap() {
+	git clone --bare --no-local --branch main "$1" "$2" &&
+	git -C "$2" log --all --format="tformat:create refs/commit/%h %H" |
+		git -C "$2" update-ref --stdin &&
+	git -C "$2" repack -Adb
+}
+
+for repo in empty clone clone_bitmap extrarefs extrarefs_bitmap
+do
+	test_expect_success "$repo setup" '
+		rm -rf target.git &&
+		setup_$repo "$(pwd)" target.git
+	'
+
+	# If the target repository is the empty one, then the only thing we can
+	# do is to create a new branch.
+	case "$repo" in
+	empty)
+		refspecs="updated:new";;
+	*)
+		refspecs="updated:new updated:main main~10:main :main";;
+	esac
+
+	for refspec in $refspecs
+	do
+		test_expect_success "$repo seed $refspec" "
+			test_must_fail git push --force target.git '$refspec' \
+				--receive-pack='tee pack | git receive-pack' 2>err &&
+			grep 'failed in pre-receive hook' err
+		"
+
+		test_perf "$repo receive-pack $refspec" "
+			git receive-pack target.git <pack >negotiation &&
+			grep 'pre-receive hook declined' negotiation
+		"
+	done
+done
+
+test_done

[v2,1/3] p5400: add perf tests for git-receive-pack(1)

Commit Message

Comments

Patch