Message ID | 20190117132345.29791-1-avarab@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | commit-graph write: use pack order when finding commits | expand |
On 1/17/2019 8:23 AM, Ævar Arnfjörð Bjarmason wrote: > Slightly optimize the "commit-graph write" step by using > FOR_EACH_OBJECT_PACK_ORDER with for_each_object_in_pack(). See commit > [1] and [2] for the facility and a similar optimization for "cat-file". > > On Linux it is around 5% slower to run: > > echo 1 >/proc/sys/vm/drop_caches && > cat .git/objects/pack/* >/dev/null && > git cat-file --batch-all-objects --batch-check --unordered > > Than the same thing with the "cat" omitted. This is as expected, since > we're iterating in pack order and the "cat" is extra work. > > Before this change the opposite was true of "commit-graph write". We > were 6% faster if we first ran "cat" to efficiently populate the FS > cache for our sole big pack on linux.git than if we had it populated > via for_each_object_in_pack(). Now we're 3% faster without the "cat" > instead. > > The tests were done on an unloaded Linux 3.10 system with 10 runs for > each. > > 1. 736eb88fdc ("for_each_packed_object: support iterating in > pack-order", 2018-08-10) > > 2. 0750bb5b51 ("cat-file: support "unordered" output for > --batch-all-objects", 2018-08-10) Thanks, Aevar! I plan to give this a test on the Windows repository, as this is the way we generate the commit-graph in VFS for Git. I created a PR on microsoft/git [1] and am generating the installers now [2] [1] https://github.com/Microsoft/git/pull/110 [2] https://dev.azure.com/gvfs/ci/_build/results?buildId=6114 The code itself looks good, I just need to double-check the performance numbers. Thanks, -Stolee
On 1/17/2019 10:09 AM, Derrick Stolee wrote: > > The code itself looks good, I just need to double-check the > performance numbers. I tested this patch on our VFS-specific version using five "prefetch packs". Before: 22.61 s (avg) After: 22.18 s (avg) Improvement: 2% The error rate among the samples is very low (~.15 s) so this result is significant. Considering that we are writing 4.4 million commits -- many more than the objects in the packs that we are reading -- this is an excellent improvement! Thanks! -Stolee
diff --git a/commit-graph.c b/commit-graph.c index b6a074c80d..68124e8385 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -839,7 +839,8 @@ void write_commit_graph(const char *obj_dir, die(_("error adding pack %s"), packname.buf); if (open_pack_index(p)) die(_("error opening index for %s"), packname.buf); - for_each_object_in_pack(p, add_packed_commits, &oids, 0); + for_each_object_in_pack(p, add_packed_commits, &oids, + FOR_EACH_OBJECT_PACK_ORDER); close_pack(p); free(p); } @@ -885,7 +886,8 @@ void write_commit_graph(const char *obj_dir, oids.progress = start_delayed_progress( _("Finding commits for commit graph among packed objects"), approx_nr_objects); - for_each_packed_object(add_packed_commits, &oids, 0); + for_each_packed_object(add_packed_commits, &oids, + FOR_EACH_OBJECT_PACK_ORDER); if (oids.progress_done < approx_nr_objects) display_progress(oids.progress, approx_nr_objects); stop_progress(&oids.progress);