commit-graph write: use pack order when finding commits

Message ID	20190117132345.29791-1-avarab@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> From: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFzb24=?= <avarab@gmail.com> To: git@vger.kernel.org Cc: Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>, =?utf-8?b?Tmd1eeG7hW4gVGjDoWkgTmfhu41jIER1eQ==?= <pclouds@gmail.com>, =?utf-8?q?SZEDER_G=C3=A1bor?= <szeder.dev@gmail.com>, Eric Sunshine <sunshine@sunshineco.com>, Derrick Stolee <stolee@gmail.com>, =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFzb24=?= <avarab@gmail.com> Subject: [PATCH] commit-graph write: use pack order when finding commits Date: Thu, 17 Jan 2019 14:23:45 +0100 Message-Id: <20190117132345.29791-1-avarab@gmail.com> In-Reply-To: <20190116132947.26025-10-avarab@gmail.com> References: <20190116132947.26025-10-avarab@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: git-owner@vger.kernel.org Precedence: bulk
Series	commit-graph write: use pack order when finding commits \| expand commit-graph write: use pack order when finding commits

Message ID

20190117132345.29791-1-avarab@gmail.com (mailing list archive)

State

New, archived

Headers

From: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFzb24=?=  <avarab@gmail.com>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>,
	=?utf-8?b?Tmd1eeG7hW4gVGjDoWkgTmfhu41jIER1eQ==?=  <pclouds@gmail.com>,
	=?utf-8?q?SZEDER_G=C3=A1bor?= <szeder.dev@gmail.com>,
 Eric Sunshine <sunshine@sunshineco.com>, Derrick Stolee <stolee@gmail.com>,
	=?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFzb24=?=  <avarab@gmail.com>
Subject: [PATCH] commit-graph write: use pack order when finding commits
Date: Thu, 17 Jan 2019 14:23:45 +0100
Message-Id: <20190117132345.29791-1-avarab@gmail.com>
In-Reply-To: <20190116132947.26025-10-avarab@gmail.com>
References: <20190116132947.26025-10-avarab@gmail.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: git-owner@vger.kernel.org
Precedence: bulk

Series

commit-graph write: use pack order when finding commits | expand

Commit Message

Ævar Arnfjörð Bjarmason Jan. 17, 2019, 1:23 p.m. UTC

Slightly optimize the "commit-graph write" step by using
FOR_EACH_OBJECT_PACK_ORDER with for_each_object_in_pack(). See commit
[1] and [2] for the facility and a similar optimization for "cat-file".

On Linux it is around 5% slower to run:

    echo 1 >/proc/sys/vm/drop_caches &&
    cat .git/objects/pack/* >/dev/null &&
    git cat-file --batch-all-objects --batch-check --unordered

Than the same thing with the "cat" omitted. This is as expected, since
we're iterating in pack order and the "cat" is extra work.

Before this change the opposite was true of "commit-graph write". We
were 6% faster if we first ran "cat" to efficiently populate the FS
cache for our sole big pack on linux.git than if we had it populated
via for_each_object_in_pack(). Now we're 3% faster without the "cat"
instead.

The tests were done on an unloaded Linux 3.10 system with 10 runs for
each.

1. 736eb88fdc ("for_each_packed_object: support iterating in
   pack-order", 2018-08-10)

2. 0750bb5b51 ("cat-file: support "unordered" output for
   --batch-all-objects", 2018-08-10)
---

This is conceptually different from my "commit-graph write: progress
output improvements" series but conflicts with it, so it's based on
top of it. This patch goes after its 9/9.

 commit-graph.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Derrick Stolee Jan. 17, 2019, 3:09 p.m. UTC | #1

On 1/17/2019 8:23 AM, Ævar Arnfjörð Bjarmason wrote:
> Slightly optimize the "commit-graph write" step by using
> FOR_EACH_OBJECT_PACK_ORDER with for_each_object_in_pack(). See commit
> [1] and [2] for the facility and a similar optimization for "cat-file".
>
> On Linux it is around 5% slower to run:
>
>      echo 1 >/proc/sys/vm/drop_caches &&
>      cat .git/objects/pack/* >/dev/null &&
>      git cat-file --batch-all-objects --batch-check --unordered
>
> Than the same thing with the "cat" omitted. This is as expected, since
> we're iterating in pack order and the "cat" is extra work.
>
> Before this change the opposite was true of "commit-graph write". We
> were 6% faster if we first ran "cat" to efficiently populate the FS
> cache for our sole big pack on linux.git than if we had it populated
> via for_each_object_in_pack(). Now we're 3% faster without the "cat"
> instead.
>
> The tests were done on an unloaded Linux 3.10 system with 10 runs for
> each.
>
> 1. 736eb88fdc ("for_each_packed_object: support iterating in
>     pack-order", 2018-08-10)
>
> 2. 0750bb5b51 ("cat-file: support "unordered" output for
>     --batch-all-objects", 2018-08-10)
Thanks, Aevar! I plan to give this a test on the Windows repository, as 
this is the way we generate the commit-graph in VFS for Git.

I created a PR on microsoft/git [1] and am generating the installers now [2]

[1] https://github.com/Microsoft/git/pull/110

[2] https://dev.azure.com/gvfs/ci/_build/results?buildId=6114

The code itself looks good, I just need to double-check the performance 
numbers.

Thanks,
-Stolee

Derrick Stolee Jan. 17, 2019, 4:35 p.m. UTC | #2

On 1/17/2019 10:09 AM, Derrick Stolee wrote:
>
> The code itself looks good, I just need to double-check the 
> performance numbers.

I tested this patch on our VFS-specific version using five "prefetch 
packs".

Before: 22.61 s (avg)
After: 22.18 s (avg)

Improvement: 2%

The error rate among the samples is very low (~.15 s) so this result is 
significant.

Considering that we are writing 4.4 million commits -- many more than 
the objects in the packs that we are reading -- this is an excellent 
improvement!

Thanks!
-Stolee

diff --git a/commit-graph.c b/commit-graph.c
index b6a074c80d..68124e8385 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -839,7 +839,8 @@  void write_commit_graph(const char *obj_dir,
 				die(_("error adding pack %s"), packname.buf);
 			if (open_pack_index(p))
 				die(_("error opening index for %s"), packname.buf);
-			for_each_object_in_pack(p, add_packed_commits, &oids, 0);
+			for_each_object_in_pack(p, add_packed_commits, &oids,
+						FOR_EACH_OBJECT_PACK_ORDER);
 			close_pack(p);
 			free(p);
 		}
@@ -885,7 +886,8 @@  void write_commit_graph(const char *obj_dir,
 			oids.progress = start_delayed_progress(
 				_("Finding commits for commit graph among packed objects"),
 				approx_nr_objects);
-		for_each_packed_object(add_packed_commits, &oids, 0);
+		for_each_packed_object(add_packed_commits, &oids,
+				       FOR_EACH_OBJECT_PACK_ORDER);
 		if (oids.progress_done < approx_nr_objects)
 			display_progress(oids.progress, approx_nr_objects);
 		stop_progress(&oids.progress);

commit-graph write: use pack order when finding commits

Commit Message

Comments

Patch