diff mbox series

[v2,1/2] commit-graph write: add progress output

Message ID 20180907182954.2413-2-avarab@gmail.com (mailing list archive)
State New, archived
Headers show
Series commit-graph: add progress output | expand

Commit Message

Ævar Arnfjörð Bjarmason Sept. 7, 2018, 6:29 p.m. UTC
Before this change the "commit-graph write" command didn't report any
progress. On my machine this command takes more than 10 seconds to
write the graph for linux.git, and around 1m30s on the
2015-04-03-1M-git.git[1] test repository (a test case for a large
monorepository).

Furthermore, since the gc.writeCommitGraph setting was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
there was no indication at all from a "git gc" run that anything was
different. This why one of the progress bars being added here uses
start_progress() instead of start_delayed_progress(), so that it's
guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
repository:

    $ git -c gc.writeCommitGraph=true gc
    Enumerating objects: 2821, done.
    [...]
    Computing commit graph generation numbers: 100% (867/867), done.

On larger repositories, such as linux.git the delayed progress bar(s)
will kick in, and we'll show what's going on instead of, as was
previously happening, printing nothing while we write the graph:

    $ git -c gc.writeCommitGraph=true gc
    [...]
    Annotating commits in commit graph: 1565573, done.
    Computing commit graph generation numbers: 100% (782484/782484), done.

Note that here we don't show "Finding commits for commit graph", this
is because under "git gc" we seed the search with the commit
references in the repository, and that set is too small to show any
progress, but would e.g. on a smaller repo such as git.git with
--stdin-commits:

    $ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits
    Finding commits for commit graph: 100% (162576/162576), done.
    Computing commit graph generation numbers: 100% (162576/162576), done.

With --stdin-packs we don't show any estimation of how much is left to
do. This is because we might be processing more than one pack. We
could be less lazy here and show progress, either by detecting that
we're only processing one pack, or by first looping over the packs to
discover how many commits they have. I don't see the point in doing
that work. So instead we get (on 2015-04-03-1M-git.git):

    $ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
    Finding commits for commit graph: 13064614, done.
    Annotating commits in commit graph: 3001341, done.
    Computing commit graph generation numbers: 100% (1000447/1000447), done.

No GC mode uses --stdin-packs. It's what they use at Microsoft to
manually compute the generation numbers for their collection of large
packs which are never coalesced.

The reason we need a "report_progress" variable passed down from "git
gc" is so that we don't report this output when we're running in the
process "git gc --auto" detaches from the terminal.

Since we write the commit graph from the "git gc" process itself (as
opposed to what we do with say the "git repack" phase), we'd end up
writing the output to .git/gc.log and reporting it to the user next
time as part of the "The last gc run reported the following[...]"
error, see 329e6e8794 ("gc: save log from daemonized gc --auto and
print it next time", 2015-09-19).

So we must keep track of whether or not we're running in that
demonized mode, and if so print no progress.

See [2] and subsequent replies for a discussion of an approach not
taken in compute_generation_numbers(). I.e. we're saying "Computing
commit graph generation numbers", even though on an established
history we're mostly skipping over all the work we did in the
past. This is similar to the white lie we tell in the "Writing
objects" phase (not all are objects being written).

Always showing progress is considered more important than
accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang
for 6 seconds with no output on the second "git gc" if no changes were
made to any objects in the interim if we'd take the approach in [2].

1. https://github.com/avar/2015-04-03-1M-git

2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com>
   (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/commit-graph.c |  5 ++--
 builtin/gc.c           |  3 ++-
 commit-graph.c         | 60 ++++++++++++++++++++++++++++++++++++------
 commit-graph.h         |  5 ++--
 4 files changed, 60 insertions(+), 13 deletions(-)

Comments

Derrick Stolee Sept. 21, 2018, 8:01 p.m. UTC | #1
On 9/7/2018 2:29 PM, Ævar Arnfjörð Bjarmason wrote:
> -void write_commit_graph_reachable(const char *obj_dir, int append);
> +void write_commit_graph_reachable(const char *obj_dir, int append,
> +				  int report_progress);
>   void write_commit_graph(const char *obj_dir,
>   			struct string_list *pack_indexes,
>   			struct string_list *commit_hex,
> -			int append);
> +			int append, int report_progress);
>   
>   int verify_commit_graph(struct repository *r, struct commit_graph *g);
>   

Junio,

The above prototype change seems to have created a semantic conflict 
with ds/commit-graph-tests (859fdc "commit-graph: define 
GIT_TEST_COMMIT_GRAPH") because when GIT_TEST_COMMIT_GRAPH is set, we 
call write_commit_graph_reachable() but the final parameter was resolved 
to be "1" instead of "0".

This causes t3420-rebase-autostash.sh to fail, as that test watches the 
full output of the rebase command, including commit runs. The following 
patch fixes the problem, but could probably be squashed into a merge or 
other commit.

Thanks,

-Stolee

-->8--

From: Derrick Stolee <dstolee@microsoft.com>
Date: Fri, 21 Sep 2018 19:57:36 +0000
Subject: [PATCH] commit: quietly write commit-graph in tests

The GIT_TEST_COMMIT_GRAPH environment variable causes git-commit to
write a commit-graph file on every execution. Recently, we added
progress output when writing the commit-graph. This conflicts with
some expected output during some tests, so avoid writing progress
if writing a commit-graph this way.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
  builtin/commit.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 2a49ab4917..764664d832 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1660,7 +1660,7 @@ int cmd_commit(int argc, const char **argv, const 
char *prefix)
                       "not exceeded, and then \"git reset HEAD\" to 
recover."));

         if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0))
- write_commit_graph_reachable(get_object_directory(), 0, 1);
+ write_commit_graph_reachable(get_object_directory(), 0, 0);

         rerere(0);
         run_command_v_opt(argv_gc_auto, RUN_GIT_CMD);
--
2.19.0
Junio C Hamano Sept. 21, 2018, 9:43 p.m. UTC | #2
Derrick Stolee <stolee@gmail.com> writes:

> On 9/7/2018 2:29 PM, Ævar Arnfjörð Bjarmason wrote:
>> -void write_commit_graph_reachable(const char *obj_dir, int append);
>> +void write_commit_graph_reachable(const char *obj_dir, int append,
>> +				  int report_progress);
>>   void write_commit_graph(const char *obj_dir,
>>   			struct string_list *pack_indexes,
>>   			struct string_list *commit_hex,
>> -			int append);
>> +			int append, int report_progress);
>>     int verify_commit_graph(struct repository *r, struct
>> commit_graph *g);
>>   
>
> Junio,
>
> The above prototype change seems to have created a semantic conflict
> with ds/commit-graph-tests (859fdc "commit-graph: define
> GIT_TEST_COMMIT_GRAPH") because when GIT_TEST_COMMIT_GRAPH is set, we
> call write_commit_graph_reachable() but the final parameter was
> resolved to be "1" instead of "0".

Hmph.  That's unfortunate.

Perhaps one of the topics should have yielded and waited until the
other one passes through.

As 859fdc0c ("commit-graph: define GIT_TEST_COMMIT_GRAPH",
2018-08-29) already is in 'master', the other "progress" topic
probably should be corrected to match.  The easiest and cleanest
would be to eject the ab/commit-graph-progress topic out of 'next'
and have it rerolled on top of 'master', as we are going to rewind
the tip of 'next' anyway.

While we are at it, I suspect that a saner evolution of the API into
the function would not append more parameters to the call, but would
make the "do we append?" bit into a flag word "unsigned flags" with
two bits, and such a clean-up can be done as a preliminary change.

> This causes t3420-rebase-autostash.sh to fail, as that test watches
> the full output of the rebase command, including commit runs. The
> following patch fixes the problem, but could probably be squashed into
> a merge or other commit.
>
> Thanks,
>
> -Stolee
>
> -->8--
>
> From: Derrick Stolee <dstolee@microsoft.com>
> Date: Fri, 21 Sep 2018 19:57:36 +0000
> Subject: [PATCH] commit: quietly write commit-graph in tests
>
> The GIT_TEST_COMMIT_GRAPH environment variable causes git-commit to
> write a commit-graph file on every execution. Recently, we added
> progress output when writing the commit-graph. This conflicts with
> some expected output during some tests, so avoid writing progress
> if writing a commit-graph this way.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  builtin/commit.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/builtin/commit.c b/builtin/commit.c
> index 2a49ab4917..764664d832 100644
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -1660,7 +1660,7 @@ int cmd_commit(int argc, const char **argv,
> const char *prefix)
>                       "not exceeded, and then \"git reset HEAD\" to
> recover."));
>
>         if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0))
> - write_commit_graph_reachable(get_object_directory(), 0, 1);
> + write_commit_graph_reachable(get_object_directory(), 0, 0);
>
>         rerere(0);
>         run_command_v_opt(argv_gc_auto, RUN_GIT_CMD);
> --
> 2.19.0
Junio C Hamano Sept. 21, 2018, 9:57 p.m. UTC | #3
Junio C Hamano <gitster@pobox.com> writes:

>> The above prototype change seems to have created a semantic conflict
>> with ds/commit-graph-tests (859fdc "commit-graph: define
>> GIT_TEST_COMMIT_GRAPH") because when GIT_TEST_COMMIT_GRAPH is set, we
>> call write_commit_graph_reachable() but the final parameter was
>> resolved to be "1" instead of "0".
>
> Hmph.  That's unfortunate.
>
> Perhaps one of the topics should have yielded and waited until the
> other one passes through.

Nah, I see where things went wrong.  I'll queue a single-liner
"mismerge fix" to 'next', and then correct the seed for the evil
merge kept in merge-fix/ab/commit-graph-progress, and rebuild 'pu'.
Things will straighten out by themselves after that happens.

Thanks for noticing.
diff mbox series

Patch

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 0bf0c48657..bc0fa9ba52 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -151,7 +151,7 @@  static int graph_write(int argc, const char **argv)
 		opts.obj_dir = get_object_directory();
 
 	if (opts.reachable) {
-		write_commit_graph_reachable(opts.obj_dir, opts.append);
+		write_commit_graph_reachable(opts.obj_dir, opts.append, 1);
 		return 0;
 	}
 
@@ -171,7 +171,8 @@  static int graph_write(int argc, const char **argv)
 	write_commit_graph(opts.obj_dir,
 			   pack_indexes,
 			   commit_hex,
-			   opts.append);
+			   opts.append,
+			   1);
 
 	string_list_clear(&lines, 0);
 	return 0;
diff --git a/builtin/gc.c b/builtin/gc.c
index 57069442b0..06ddd3aea5 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -646,7 +646,8 @@  int cmd_gc(int argc, const char **argv, const char *prefix)
 		clean_pack_garbage();
 
 	if (gc_write_commit_graph)
-		write_commit_graph_reachable(get_object_directory(), 0);
+		write_commit_graph_reachable(get_object_directory(), 0,
+					     !daemonized);
 
 	if (auto_gc && too_many_loose_objects())
 		warning(_("There are too many unreachable loose objects; "
diff --git a/commit-graph.c b/commit-graph.c
index 8a1bec7b8a..2c5d996194 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -13,6 +13,7 @@ 
 #include "commit-graph.h"
 #include "object-store.h"
 #include "alloc.h"
+#include "progress.h"
 
 #define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
 #define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
@@ -548,6 +549,8 @@  struct packed_oid_list {
 	struct object_id *list;
 	int nr;
 	int alloc;
+	struct progress *progress;
+	int progress_done;
 };
 
 static int add_packed_commits(const struct object_id *oid,
@@ -560,6 +563,9 @@  static int add_packed_commits(const struct object_id *oid,
 	off_t offset = nth_packed_object_offset(pack, pos);
 	struct object_info oi = OBJECT_INFO_INIT;
 
+	if (list->progress)
+		display_progress(list->progress, ++list->progress_done);
+
 	oi.typep = &type;
 	if (packed_object_info(the_repository, pack, offset, &oi) < 0)
 		die(_("unable to get type of object %s"), oid_to_hex(oid));
@@ -587,12 +593,18 @@  static void add_missing_parents(struct packed_oid_list *oids, struct commit *com
 	}
 }
 
-static void close_reachable(struct packed_oid_list *oids)
+static void close_reachable(struct packed_oid_list *oids, int report_progress)
 {
 	int i;
 	struct commit *commit;
+	struct progress *progress = NULL;
+	int j = 0;
 
+	if (report_progress)
+		progress = start_delayed_progress(
+			_("Annotating commits in commit graph"), 0);
 	for (i = 0; i < oids->nr; i++) {
+		display_progress(progress, ++j);
 		commit = lookup_commit(the_repository, &oids->list[i]);
 		if (commit)
 			commit->object.flags |= UNINTERESTING;
@@ -604,6 +616,7 @@  static void close_reachable(struct packed_oid_list *oids)
 	 * closure.
 	 */
 	for (i = 0; i < oids->nr; i++) {
+		display_progress(progress, ++j);
 		commit = lookup_commit(the_repository, &oids->list[i]);
 
 		if (commit && !parse_commit(commit))
@@ -611,19 +624,28 @@  static void close_reachable(struct packed_oid_list *oids)
 	}
 
 	for (i = 0; i < oids->nr; i++) {
+		display_progress(progress, ++j);
 		commit = lookup_commit(the_repository, &oids->list[i]);
 
 		if (commit)
 			commit->object.flags &= ~UNINTERESTING;
 	}
+	stop_progress(&progress);
 }
 
-static void compute_generation_numbers(struct packed_commit_list* commits)
+static void compute_generation_numbers(struct packed_commit_list* commits, 
+				       int report_progress)
 {
 	int i;
 	struct commit_list *list = NULL;
+	struct progress *progress = NULL;
 
+	if (report_progress)
+		progress = start_progress(
+			_("Computing commit graph generation numbers"),
+			commits->nr);
 	for (i = 0; i < commits->nr; i++) {
+		display_progress(progress, i + 1);
 		if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
 		    commits->list[i]->generation != GENERATION_NUMBER_ZERO)
 			continue;
@@ -655,6 +677,7 @@  static void compute_generation_numbers(struct packed_commit_list* commits)
 			}
 		}
 	}
+	stop_progress(&progress);
 }
 
 static int add_ref_to_list(const char *refname,
@@ -667,19 +690,20 @@  static int add_ref_to_list(const char *refname,
 	return 0;
 }
 
-void write_commit_graph_reachable(const char *obj_dir, int append)
+void write_commit_graph_reachable(const char *obj_dir, int append,
+				  int report_progress)
 {
 	struct string_list list;
 
 	string_list_init(&list, 1);
 	for_each_ref(add_ref_to_list, &list);
-	write_commit_graph(obj_dir, NULL, &list, append);
+	write_commit_graph(obj_dir, NULL, &list, append, report_progress);
 }
 
 void write_commit_graph(const char *obj_dir,
 			struct string_list *pack_indexes,
 			struct string_list *commit_hex,
-			int append)
+			int append, int report_progress)
 {
 	struct packed_oid_list oids;
 	struct packed_commit_list commits;
@@ -692,9 +716,12 @@  void write_commit_graph(const char *obj_dir,
 	int num_chunks;
 	int num_extra_edges;
 	struct commit_list *parent;
+	struct progress *progress = NULL;
 
 	oids.nr = 0;
 	oids.alloc = approximate_object_count() / 4;
+	oids.progress = NULL;
+	oids.progress_done = 0;
 
 	if (append) {
 		prepare_commit_graph_one(the_repository, obj_dir);
@@ -721,6 +748,11 @@  void write_commit_graph(const char *obj_dir,
 		int dirlen;
 		strbuf_addf(&packname, "%s/pack/", obj_dir);
 		dirlen = packname.len;
+		if (report_progress) {
+			oids.progress = start_delayed_progress(
+				_("Finding commits for commit graph"), 0);
+			oids.progress_done = 0;
+		}
 		for (i = 0; i < pack_indexes->nr; i++) {
 			struct packed_git *p;
 			strbuf_setlen(&packname, dirlen);
@@ -733,15 +765,21 @@  void write_commit_graph(const char *obj_dir,
 			for_each_object_in_pack(p, add_packed_commits, &oids, 0);
 			close_pack(p);
 		}
+		stop_progress(&oids.progress);
 		strbuf_release(&packname);
 	}
 
 	if (commit_hex) {
+		if (report_progress)
+			progress = start_delayed_progress(
+				_("Finding commits for commit graph"),
+				commit_hex->nr);
 		for (i = 0; i < commit_hex->nr; i++) {
 			const char *end;
 			struct object_id oid;
 			struct commit *result;
 
+			display_progress(progress, i + 1);
 			if (commit_hex->items[i].string &&
 			    parse_oid_hex(commit_hex->items[i].string, &oid, &end))
 				continue;
@@ -754,12 +792,18 @@  void write_commit_graph(const char *obj_dir,
 				oids.nr++;
 			}
 		}
+		stop_progress(&progress);
 	}
 
-	if (!pack_indexes && !commit_hex)
+	if (!pack_indexes && !commit_hex) {
+		if (report_progress)
+			oids.progress = start_delayed_progress(
+				_("Finding commits for commit graph"), 0);
 		for_each_packed_object(add_packed_commits, &oids, 0);
+		stop_progress(&oids.progress);
+	}
 
-	close_reachable(&oids);
+	close_reachable(&oids, report_progress);
 
 	QSORT(oids.list, oids.nr, commit_compare);
 
@@ -799,7 +843,7 @@  void write_commit_graph(const char *obj_dir,
 	if (commits.nr >= GRAPH_PARENT_MISSING)
 		die(_("too many commits to write graph"));
 
-	compute_generation_numbers(&commits);
+	compute_generation_numbers(&commits, report_progress);
 
 	graph_name = get_commit_graph_filename(obj_dir);
 	if (safe_create_leading_directories(graph_name))
diff --git a/commit-graph.h b/commit-graph.h
index eea62f8c0e..f50712a973 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -52,11 +52,12 @@  struct commit_graph {
 
 struct commit_graph *load_commit_graph_one(const char *graph_file);
 
-void write_commit_graph_reachable(const char *obj_dir, int append);
+void write_commit_graph_reachable(const char *obj_dir, int append,
+				  int report_progress);
 void write_commit_graph(const char *obj_dir,
 			struct string_list *pack_indexes,
 			struct string_list *commit_hex,
-			int append);
+			int append, int report_progress);
 
 int verify_commit_graph(struct repository *r, struct commit_graph *g);