commit-graph: don't show progress percentages while expanding reachable commits
diff mbox series

Message ID 20190322102817.19708-1-szeder.dev@gmail.com
State New
Headers show
Series
  • commit-graph: don't show progress percentages while expanding reachable commits
Related show

Commit Message

SZEDER Gábor March 22, 2019, 10:28 a.m. UTC
Commit 49bbc57a57 (commit-graph write: emit a percentage for all
progress, 2019-01-19) was a bit overeager when it added progress
percentages to the "Expanding reachable commits in commit graph" phase
as well, because most of the time the number of commits that phase has
to iterate over is not known in advance and grows significantly, and,
consequently, we end up with nonsensical numbers:

  $ git commit-graph write --reachable
  Expanding reachable commits in commit graph: 138606% (824706/595), done.
  [...]

  $ git rev-parse v5.0 | git commit-graph write --stdin-commits
  Expanding reachable commits in commit graph: 81264400% (812644/1), done.
  [...]

Therefore, don't show progress percentages in the "Expanding reachable
commits in commit graph" phase.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 commit-graph.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Ævar Arnfjörð Bjarmason March 22, 2019, 11:11 a.m. UTC | #1
On Fri, Mar 22 2019, SZEDER Gábor wrote:

> Commit 49bbc57a57 (commit-graph write: emit a percentage for all
> progress, 2019-01-19) was a bit overeager when it added progress
> percentages to the "Expanding reachable commits in commit graph" phase
> as well, because most of the time the number of commits that phase has
> to iterate over is not known in advance and grows significantly, and,
> consequently, we end up with nonsensical numbers:
>
>   $ git commit-graph write --reachable
>   Expanding reachable commits in commit graph: 138606% (824706/595), done.
>   [...]
>
>   $ git rev-parse v5.0 | git commit-graph write --stdin-commits
>   Expanding reachable commits in commit graph: 81264400% (812644/1), done.
>   [...]
>
> Therefore, don't show progress percentages in the "Expanding reachable
> commits in commit graph" phase.

There's indeed a bug here as your examples show, but there *are* cases
where it's correct, as the commit message for my patch on "master" shows
there's cases where we correctly.

So this "fixes" things by always removing the progress, why not instead
pass down the state to close_reachable() about what we're walking over,
so we can always show progress, or at least in some cases?

> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> ---
>  commit-graph.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index 017225ccea..60c06ce58f 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -672,13 +672,13 @@ static void close_reachable(struct packed_oid_list *oids, int report_progress)
>  	 * As this loop runs, oids->nr may grow, but not more
>  	 * than the number of missing commits in the reachable
>  	 * closure.
>  	 */
>  	if (report_progress)
>  		progress = start_delayed_progress(
> -			_("Expanding reachable commits in commit graph"), oids->nr);
> +			_("Expanding reachable commits in commit graph"), 0);
>  	for (i = 0; i < oids->nr; i++) {
>  		display_progress(progress, i + 1);
>  		commit = lookup_commit(the_repository, &oids->list[i]);
>
>  		if (commit && !parse_commit(commit))
>  			add_missing_parents(oids, commit);
SZEDER Gábor March 22, 2019, 11:18 a.m. UTC | #2
On Fri, Mar 22, 2019 at 12:11:26PM +0100, Ævar Arnfjörð Bjarmason wrote:
> 
> On Fri, Mar 22 2019, SZEDER Gábor wrote:
> 
> > Commit 49bbc57a57 (commit-graph write: emit a percentage for all
> > progress, 2019-01-19) was a bit overeager when it added progress
> > percentages to the "Expanding reachable commits in commit graph" phase
> > as well, because most of the time the number of commits that phase has
> > to iterate over is not known in advance and grows significantly, and,
> > consequently, we end up with nonsensical numbers:
> >
> >   $ git commit-graph write --reachable
> >   Expanding reachable commits in commit graph: 138606% (824706/595), done.
> >   [...]
> >
> >   $ git rev-parse v5.0 | git commit-graph write --stdin-commits
> >   Expanding reachable commits in commit graph: 81264400% (812644/1), done.
> >   [...]
> >
> > Therefore, don't show progress percentages in the "Expanding reachable
> > commits in commit graph" phase.
> 
> There's indeed a bug here as your examples show, but there *are* cases
> where it's correct, as the commit message for my patch on "master" shows
> there's cases where we correctly.
> 
> So this "fixes" things by always removing the progress, why not instead
> pass down the state to close_reachable() about what we're walking over,
> so we can always show progress, or at least in some cases?

The cases where it does display correct percentages are exceptional,
and doesn't worth the effort to try to find out whether ther current
operation happens to be such a case.

> > Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> > ---
> >  commit-graph.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/commit-graph.c b/commit-graph.c
> > index 017225ccea..60c06ce58f 100644
> > --- a/commit-graph.c
> > +++ b/commit-graph.c
> > @@ -672,13 +672,13 @@ static void close_reachable(struct packed_oid_list *oids, int report_progress)
> >  	 * As this loop runs, oids->nr may grow, but not more
> >  	 * than the number of missing commits in the reachable
> >  	 * closure.
> >  	 */
> >  	if (report_progress)
> >  		progress = start_delayed_progress(
> > -			_("Expanding reachable commits in commit graph"), oids->nr);
> > +			_("Expanding reachable commits in commit graph"), 0);
> >  	for (i = 0; i < oids->nr; i++) {
> >  		display_progress(progress, i + 1);
> >  		commit = lookup_commit(the_repository, &oids->list[i]);
> >
> >  		if (commit && !parse_commit(commit))
> >  			add_missing_parents(oids, commit);
Ævar Arnfjörð Bjarmason March 22, 2019, 2:28 p.m. UTC | #3
On Fri, Mar 22 2019, SZEDER Gábor wrote:

> On Fri, Mar 22, 2019 at 12:11:26PM +0100, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Fri, Mar 22 2019, SZEDER Gábor wrote:
>>
>> > Commit 49bbc57a57 (commit-graph write: emit a percentage for all
>> > progress, 2019-01-19) was a bit overeager when it added progress
>> > percentages to the "Expanding reachable commits in commit graph" phase
>> > as well, because most of the time the number of commits that phase has
>> > to iterate over is not known in advance and grows significantly, and,
>> > consequently, we end up with nonsensical numbers:
>> >
>> >   $ git commit-graph write --reachable
>> >   Expanding reachable commits in commit graph: 138606% (824706/595), done.
>> >   [...]
>> >
>> >   $ git rev-parse v5.0 | git commit-graph write --stdin-commits
>> >   Expanding reachable commits in commit graph: 81264400% (812644/1), done.
>> >   [...]
>> >
>> > Therefore, don't show progress percentages in the "Expanding reachable
>> > commits in commit graph" phase.
>>
>> There's indeed a bug here as your examples show, but there *are* cases
>> where it's correct, as the commit message for my patch on "master" shows
>> there's cases where we correctly.
>>
>> So this "fixes" things by always removing the progress, why not instead
>> pass down the state to close_reachable() about what we're walking over,
>> so we can always show progress, or at least in some cases?
>
> The cases where it does display correct percentages are exceptional,
> and doesn't worth the effort to try to find out whether ther current
> operation happens to be such a case.

It's the "write" entry point without arguments that displays the correct
progress. So not exceptional, but yeah, it's not what we use on "gc".

In any case, the problem is that sometimes we've walked the full set of
commits already, and some other times we haven't.

So in cases where we have we can show progress, and as a TODO (I think
this came up in previous discussions), we could do better if we had a
approximate_commit_count().

In any case, the below fix seems correct to me, but I haven't poked it
much. It *does* suffer from a theoretical race with the progress bar
similar to d9b1b309cf ("commit-graph write: show progress for object
search", 2019-01-19), but I work around it in the same way:

diff --git a/commit-graph.c b/commit-graph.c
index 47e9be0a3a..0fab3d8b2b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -693,7 +693,8 @@ static void add_missing_parents(struct packed_oid_list *oids, struct commit *com
 	}
 }
 
-static void close_reachable(struct packed_oid_list *oids, int report_progress)
+static void close_reachable(struct packed_oid_list *oids, int report_progress,
+			    uint64_t oids_count_for_progress)
 {
 	int i;
 	struct commit *commit;
@@ -717,7 +718,8 @@ static void close_reachable(struct packed_oid_list *oids, int report_progress)
 	 */
 	if (report_progress)
 		progress = start_delayed_progress(
-			_("Expanding reachable commits in commit graph"), oids->nr);
+			_("Expanding reachable commits in commit graph"),
+			oids_count_for_progress);
 	for (i = 0; i < oids->nr; i++) {
 		display_progress(progress, i + 1);
 		commit = lookup_commit(the_repository, &oids->list[i]);
@@ -725,6 +727,8 @@ static void close_reachable(struct packed_oid_list *oids, int report_progress)
 		if (commit && !parse_commit(commit))
 			add_missing_parents(oids, commit);
 	}
+	if (oids->nr < oids_count_for_progress)
+		display_progress(progress, oids_count_for_progress);
 	stop_progress(&progress);
 
 	if (report_progress)
@@ -829,6 +833,7 @@ void write_commit_graph(const char *obj_dir,
 	uint64_t progress_cnt = 0;
 	struct strbuf progress_title = STRBUF_INIT;
 	unsigned long approx_nr_objects;
+	uint64_t oids_count_for_progress = 0;
 
 	if (!commit_graph_compatible(the_repository))
 		return;
@@ -934,9 +939,10 @@ void write_commit_graph(const char *obj_dir,
 		if (oids.progress_done < approx_nr_objects)
 			display_progress(oids.progress, approx_nr_objects);
 		stop_progress(&oids.progress);
+		oids_count_for_progress = oids.nr;
 	}
 
-	close_reachable(&oids, report_progress);
+	close_reachable(&oids, report_progress, oids_count_for_progress);
 
 	if (report_progress)
 		progress = start_delayed_progress(
Ævar Arnfjörð Bjarmason March 22, 2019, 2:36 p.m. UTC | #4
On Fri, Mar 22 2019, Ævar Arnfjörð Bjarmason wrote:

> On Fri, Mar 22 2019, SZEDER Gábor wrote:
>
>> On Fri, Mar 22, 2019 at 12:11:26PM +0100, Ævar Arnfjörð Bjarmason wrote:
>>>
>>> On Fri, Mar 22 2019, SZEDER Gábor wrote:
>>>
>>> > Commit 49bbc57a57 (commit-graph write: emit a percentage for all
>>> > progress, 2019-01-19) was a bit overeager when it added progress
>>> > percentages to the "Expanding reachable commits in commit graph" phase
>>> > as well, because most of the time the number of commits that phase has
>>> > to iterate over is not known in advance and grows significantly, and,
>>> > consequently, we end up with nonsensical numbers:
>>> >
>>> >   $ git commit-graph write --reachable
>>> >   Expanding reachable commits in commit graph: 138606% (824706/595), done.
>>> >   [...]
>>> >
>>> >   $ git rev-parse v5.0 | git commit-graph write --stdin-commits
>>> >   Expanding reachable commits in commit graph: 81264400% (812644/1), done.
>>> >   [...]
>>> >
>>> > Therefore, don't show progress percentages in the "Expanding reachable
>>> > commits in commit graph" phase.
>>>
>>> There's indeed a bug here as your examples show, but there *are* cases
>>> where it's correct, as the commit message for my patch on "master" shows
>>> there's cases where we correctly.
>>>
>>> So this "fixes" things by always removing the progress, why not instead
>>> pass down the state to close_reachable() about what we're walking over,
>>> so we can always show progress, or at least in some cases?
>>
>> The cases where it does display correct percentages are exceptional,
>> and doesn't worth the effort to try to find out whether ther current
>> operation happens to be such a case.
>
> It's the "write" entry point without arguments that displays the correct
> progress. So not exceptional, but yeah, it's not what we use on "gc".
>
> In any case, the problem is that sometimes we've walked the full set of
> commits already, and some other times we haven't.
>
> So in cases where we have we can show progress, and as a TODO (I think
> this came up in previous discussions), we could do better if we had a
> approximate_commit_count().
>
> In any case, the below fix seems correct to me, but I haven't poked it
> much. It *does* suffer from a theoretical race with the progress bar
> similar to d9b1b309cf ("commit-graph write: show progress for object
> search", 2019-01-19), but I work around it in the same way:

So double brainfart there, I meant "race with new objects being
added/removed", but ....

> diff --git a/commit-graph.c b/commit-graph.c
> index 47e9be0a3a..0fab3d8b2b 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -693,7 +693,8 @@ static void add_missing_parents(struct packed_oid_list *oids, struct commit *com
>  	}
>  }
>
> -static void close_reachable(struct packed_oid_list *oids, int report_progress)
> +static void close_reachable(struct packed_oid_list *oids, int report_progress,
> +			    uint64_t oids_count_for_progress)
>  {
>  	int i;
>  	struct commit *commit;
> @@ -717,7 +718,8 @@ static void close_reachable(struct packed_oid_list *oids, int report_progress)
>  	 */
>  	if (report_progress)
>  		progress = start_delayed_progress(
> -			_("Expanding reachable commits in commit graph"), oids->nr);
> +			_("Expanding reachable commits in commit graph"),
> +			oids_count_for_progress);
>  	for (i = 0; i < oids->nr; i++) {
>  		display_progress(progress, i + 1);
>  		commit = lookup_commit(the_repository, &oids->list[i]);
> @@ -725,6 +727,8 @@ static void close_reachable(struct packed_oid_list *oids, int report_progress)
>  		if (commit && !parse_commit(commit))
>  			add_missing_parents(oids, commit);
>  	}
> +	if (oids->nr < oids_count_for_progress)
> +		display_progress(progress, oids_count_for_progress);


...that race exists for the approximate_object_count case in d9b1b309cf
because in the time between approximate_object_count() and our second
walk via for_each_packed_object() the packs may have changed
(e.g. concurrent "git push").

But this won't happen in this case, because we already know the commits
we want to seed from, and the only thing that will "change" is that we
haven't walked the parents of some of those commits when we e.g. seed
from just HEAD.

But when oids_count_for_progress is >0 here we've walked all the commits
we found in the relevant packs already, it's not possible that they've
gained new parents in the interim, so this "oids->nr <
oids_count_for_progress" case here is pointless cargo-culting.

>  	stop_progress(&progress);
>
>  	if (report_progress)
> @@ -829,6 +833,7 @@ void write_commit_graph(const char *obj_dir,
>  	uint64_t progress_cnt = 0;
>  	struct strbuf progress_title = STRBUF_INIT;
>  	unsigned long approx_nr_objects;
> +	uint64_t oids_count_for_progress = 0;
>
>  	if (!commit_graph_compatible(the_repository))
>  		return;
> @@ -934,9 +939,10 @@ void write_commit_graph(const char *obj_dir,
>  		if (oids.progress_done < approx_nr_objects)
>  			display_progress(oids.progress, approx_nr_objects);
>  		stop_progress(&oids.progress);
> +		oids_count_for_progress = oids.nr;
>  	}
>
> -	close_reachable(&oids, report_progress);
> +	close_reachable(&oids, report_progress, oids_count_for_progress);
>
>  	if (report_progress)
>  		progress = start_delayed_progress(
SZEDER Gábor March 22, 2019, 2:55 p.m. UTC | #5
On Fri, Mar 22, 2019 at 03:28:34PM +0100, Ævar Arnfjörð Bjarmason wrote:
> 
> On Fri, Mar 22 2019, SZEDER Gábor wrote:
> 
> > On Fri, Mar 22, 2019 at 12:11:26PM +0100, Ævar Arnfjörð Bjarmason wrote:
> >>
> >> On Fri, Mar 22 2019, SZEDER Gábor wrote:
> >>
> >> > Commit 49bbc57a57 (commit-graph write: emit a percentage for all
> >> > progress, 2019-01-19) was a bit overeager when it added progress
> >> > percentages to the "Expanding reachable commits in commit graph" phase
> >> > as well, because most of the time the number of commits that phase has
> >> > to iterate over is not known in advance and grows significantly, and,
> >> > consequently, we end up with nonsensical numbers:
> >> >
> >> >   $ git commit-graph write --reachable
> >> >   Expanding reachable commits in commit graph: 138606% (824706/595), done.
> >> >   [...]
> >> >
> >> >   $ git rev-parse v5.0 | git commit-graph write --stdin-commits
> >> >   Expanding reachable commits in commit graph: 81264400% (812644/1), done.
> >> >   [...]
> >> >
> >> > Therefore, don't show progress percentages in the "Expanding reachable
> >> > commits in commit graph" phase.
> >>
> >> There's indeed a bug here as your examples show, but there *are* cases
> >> where it's correct, as the commit message for my patch on "master" shows
> >> there's cases where we correctly.
> >>
> >> So this "fixes" things by always removing the progress, why not instead
> >> pass down the state to close_reachable() about what we're walking over,
> >> so we can always show progress, or at least in some cases?
> >
> > The cases where it does display correct percentages are exceptional,
> > and doesn't worth the effort to try to find out whether ther current
> > operation happens to be such a case.
> 
> It's the "write" entry point without arguments that displays the correct
> progress. So not exceptional, but yeah, it's not what we use on "gc".

Bit it displays the correct number only if all the reachable commits
are in packfiles, which is not necessarily the case (e.g. unpacked
small packs during 'git fetch').

> In any case, the problem is that sometimes we've walked the full set of
> commits already, and some other times we haven't.

... and that we can't really be sure whether we've walked the full set
of commits until after this loop.

> So in cases where we have we can show progress, and as a TODO (I think
> this came up in previous discussions), we could do better if we had a
> approximate_commit_count().
> 
> In any case, the below fix seems correct to me, but I haven't poked it
> much. It *does* suffer from a theoretical race with the progress bar
> similar to d9b1b309cf ("commit-graph write: show progress for object
> search", 2019-01-19), but I work around it in the same way:
> 
> diff --git a/commit-graph.c b/commit-graph.c
> index 47e9be0a3a..0fab3d8b2b 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -693,7 +693,8 @@ static void add_missing_parents(struct packed_oid_list *oids, struct commit *com
>  	}
>  }
>  
> -static void close_reachable(struct packed_oid_list *oids, int report_progress)
> +static void close_reachable(struct packed_oid_list *oids, int report_progress,
> +			    uint64_t oids_count_for_progress)
>  {
>  	int i;
>  	struct commit *commit;
> @@ -717,7 +718,8 @@ static void close_reachable(struct packed_oid_list *oids, int report_progress)
>  	 */
>  	if (report_progress)
>  		progress = start_delayed_progress(
> -			_("Expanding reachable commits in commit graph"), oids->nr);
> +			_("Expanding reachable commits in commit graph"),
> +			oids_count_for_progress);
>  	for (i = 0; i < oids->nr; i++) {
>  		display_progress(progress, i + 1);
>  		commit = lookup_commit(the_repository, &oids->list[i]);
> @@ -725,6 +727,8 @@ static void close_reachable(struct packed_oid_list *oids, int report_progress)
>  		if (commit && !parse_commit(commit))
>  			add_missing_parents(oids, commit);
>  	}
> +	if (oids->nr < oids_count_for_progress)
> +		display_progress(progress, oids_count_for_progress);
>  	stop_progress(&progress);
>  
>  	if (report_progress)
> @@ -829,6 +833,7 @@ void write_commit_graph(const char *obj_dir,
>  	uint64_t progress_cnt = 0;
>  	struct strbuf progress_title = STRBUF_INIT;
>  	unsigned long approx_nr_objects;
> +	uint64_t oids_count_for_progress = 0;
>  
>  	if (!commit_graph_compatible(the_repository))
>  		return;
> @@ -934,9 +939,10 @@ void write_commit_graph(const char *obj_dir,
>  		if (oids.progress_done < approx_nr_objects)
>  			display_progress(oids.progress, approx_nr_objects);
>  		stop_progress(&oids.progress);
> +		oids_count_for_progress = oids.nr;
>  	}
>  
> -	close_reachable(&oids, report_progress);
> +	close_reachable(&oids, report_progress, oids_count_for_progress);
>  
>  	if (report_progress)
>  		progress = start_delayed_progress(
>
Ævar Arnfjörð Bjarmason March 22, 2019, 3:11 p.m. UTC | #6
On Fri, Mar 22 2019, SZEDER Gábor wrote:

> On Fri, Mar 22, 2019 at 03:28:34PM +0100, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Fri, Mar 22 2019, SZEDER Gábor wrote:
>>
>> > On Fri, Mar 22, 2019 at 12:11:26PM +0100, Ævar Arnfjörð Bjarmason wrote:
>> >>
>> >> On Fri, Mar 22 2019, SZEDER Gábor wrote:
>> >>
>> >> > Commit 49bbc57a57 (commit-graph write: emit a percentage for all
>> >> > progress, 2019-01-19) was a bit overeager when it added progress
>> >> > percentages to the "Expanding reachable commits in commit graph" phase
>> >> > as well, because most of the time the number of commits that phase has
>> >> > to iterate over is not known in advance and grows significantly, and,
>> >> > consequently, we end up with nonsensical numbers:
>> >> >
>> >> >   $ git commit-graph write --reachable
>> >> >   Expanding reachable commits in commit graph: 138606% (824706/595), done.
>> >> >   [...]
>> >> >
>> >> >   $ git rev-parse v5.0 | git commit-graph write --stdin-commits
>> >> >   Expanding reachable commits in commit graph: 81264400% (812644/1), done.
>> >> >   [...]
>> >> >
>> >> > Therefore, don't show progress percentages in the "Expanding reachable
>> >> > commits in commit graph" phase.
>> >>
>> >> There's indeed a bug here as your examples show, but there *are* cases
>> >> where it's correct, as the commit message for my patch on "master" shows
>> >> there's cases where we correctly.
>> >>
>> >> So this "fixes" things by always removing the progress, why not instead
>> >> pass down the state to close_reachable() about what we're walking over,
>> >> so we can always show progress, or at least in some cases?
>> >
>> > The cases where it does display correct percentages are exceptional,
>> > and doesn't worth the effort to try to find out whether ther current
>> > operation happens to be such a case.
>>
>> It's the "write" entry point without arguments that displays the correct
>> progress. So not exceptional, but yeah, it's not what we use on "gc".
>
> Bit it displays the correct number only if all the reachable commits
> are in packfiles, which is not necessarily the case (e.g. unpacked
> small packs during 'git fetch').

No, argument-less "write" only considers packed commits.

>> In any case, the problem is that sometimes we've walked the full set of
>> commits already, and some other times we haven't.
>
> ... and that we can't really be sure whether we've walked the full set
> of commits until after this loop.

I'm fairly sure we can when we start with a full walk. See my
explanation in <87imwbc6x8.fsf@evledraar.gmail.com>, but I may have
missed something.

>> So in cases where we have we can show progress, and as a TODO (I think
>> this came up in previous discussions), we could do better if we had a
>> approximate_commit_count().
>>
>> In any case, the below fix seems correct to me, but I haven't poked it
>> much. It *does* suffer from a theoretical race with the progress bar
>> similar to d9b1b309cf ("commit-graph write: show progress for object
>> search", 2019-01-19), but I work around it in the same way:
>>
>> diff --git a/commit-graph.c b/commit-graph.c
>> index 47e9be0a3a..0fab3d8b2b 100644
>> --- a/commit-graph.c
>> +++ b/commit-graph.c
>> @@ -693,7 +693,8 @@ static void add_missing_parents(struct packed_oid_list *oids, struct commit *com
>>  	}
>>  }
>>
>> -static void close_reachable(struct packed_oid_list *oids, int report_progress)
>> +static void close_reachable(struct packed_oid_list *oids, int report_progress,
>> +			    uint64_t oids_count_for_progress)
>>  {
>>  	int i;
>>  	struct commit *commit;
>> @@ -717,7 +718,8 @@ static void close_reachable(struct packed_oid_list *oids, int report_progress)
>>  	 */
>>  	if (report_progress)
>>  		progress = start_delayed_progress(
>> -			_("Expanding reachable commits in commit graph"), oids->nr);
>> +			_("Expanding reachable commits in commit graph"),
>> +			oids_count_for_progress);
>>  	for (i = 0; i < oids->nr; i++) {
>>  		display_progress(progress, i + 1);
>>  		commit = lookup_commit(the_repository, &oids->list[i]);
>> @@ -725,6 +727,8 @@ static void close_reachable(struct packed_oid_list *oids, int report_progress)
>>  		if (commit && !parse_commit(commit))
>>  			add_missing_parents(oids, commit);
>>  	}
>> +	if (oids->nr < oids_count_for_progress)
>> +		display_progress(progress, oids_count_for_progress);
>>  	stop_progress(&progress);
>>
>>  	if (report_progress)
>> @@ -829,6 +833,7 @@ void write_commit_graph(const char *obj_dir,
>>  	uint64_t progress_cnt = 0;
>>  	struct strbuf progress_title = STRBUF_INIT;
>>  	unsigned long approx_nr_objects;
>> +	uint64_t oids_count_for_progress = 0;
>>
>>  	if (!commit_graph_compatible(the_repository))
>>  		return;
>> @@ -934,9 +939,10 @@ void write_commit_graph(const char *obj_dir,
>>  		if (oids.progress_done < approx_nr_objects)
>>  			display_progress(oids.progress, approx_nr_objects);
>>  		stop_progress(&oids.progress);
>> +		oids_count_for_progress = oids.nr;
>>  	}
>>
>> -	close_reachable(&oids, report_progress);
>> +	close_reachable(&oids, report_progress, oids_count_for_progress);
>>
>>  	if (report_progress)
>>  		progress = start_delayed_progress(
>>
SZEDER Gábor March 22, 2019, 3:49 p.m. UTC | #7
On Fri, Mar 22, 2019 at 04:11:24PM +0100, Ævar Arnfjörð Bjarmason wrote:
> 
> On Fri, Mar 22 2019, SZEDER Gábor wrote:
> 
> > On Fri, Mar 22, 2019 at 03:28:34PM +0100, Ævar Arnfjörð Bjarmason wrote:
> >>
> >> On Fri, Mar 22 2019, SZEDER Gábor wrote:
> >>
> >> > On Fri, Mar 22, 2019 at 12:11:26PM +0100, Ævar Arnfjörð Bjarmason wrote:
> >> >>
> >> >> On Fri, Mar 22 2019, SZEDER Gábor wrote:
> >> >>
> >> >> > Commit 49bbc57a57 (commit-graph write: emit a percentage for all
> >> >> > progress, 2019-01-19) was a bit overeager when it added progress
> >> >> > percentages to the "Expanding reachable commits in commit graph" phase
> >> >> > as well, because most of the time the number of commits that phase has
> >> >> > to iterate over is not known in advance and grows significantly, and,
> >> >> > consequently, we end up with nonsensical numbers:
> >> >> >
> >> >> >   $ git commit-graph write --reachable
> >> >> >   Expanding reachable commits in commit graph: 138606% (824706/595), done.
> >> >> >   [...]
> >> >> >
> >> >> >   $ git rev-parse v5.0 | git commit-graph write --stdin-commits
> >> >> >   Expanding reachable commits in commit graph: 81264400% (812644/1), done.
> >> >> >   [...]
> >> >> >
> >> >> > Therefore, don't show progress percentages in the "Expanding reachable
> >> >> > commits in commit graph" phase.
> >> >>
> >> >> There's indeed a bug here as your examples show, but there *are* cases
> >> >> where it's correct, as the commit message for my patch on "master" shows
> >> >> there's cases where we correctly.
> >> >>
> >> >> So this "fixes" things by always removing the progress, why not instead
> >> >> pass down the state to close_reachable() about what we're walking over,
> >> >> so we can always show progress, or at least in some cases?
> >> >
> >> > The cases where it does display correct percentages are exceptional,
> >> > and doesn't worth the effort to try to find out whether ther current
> >> > operation happens to be such a case.
> >>
> >> It's the "write" entry point without arguments that displays the correct
> >> progress. So not exceptional, but yeah, it's not what we use on "gc".
> >
> > Bit it displays the correct number only if all the reachable commits
> > are in packfiles, which is not necessarily the case (e.g. unpacked
> > small packs during 'git fetch').
> 
> No, argument-less "write" only considers packed commits.

No, it considers packed commits as starting points, and then expands
to all reachable commits, that's what that loop in question is about.

  $ git init
  Initialized empty Git repository in /tmp/test/.git/
  $ echo >file
  $ git add file
  $ git commit -q -m initial
  $ echo 1 >file
  $ git commit -q -m 1 file
  $ git rev-parse HEAD | git pack-objects
  .git/objects/pack/pack
  Enumerating objects: 1, done.
  Counting objects: 100% (1/1), done.
  ece3ff72952af2b28e048fa5c58db88c44312876
  Writing objects: 100% (1/1), done.
  Total 1 (delta 0), reused 0 (delta 0)
  $ git commit-graph write
  Computing commit graph generation numbers: 100% (2/2), done.


> >> In any case, the problem is that sometimes we've walked the full set of
> >> commits already, and some other times we haven't.
> >
> > ... and that we can't really be sure whether we've walked the full set
> > of commits until after this loop.
> 
> I'm fairly sure we can when we start with a full walk. See my
> explanation in <87imwbc6x8.fsf@evledraar.gmail.com>, but I may have
> missed something.
> 
> >> So in cases where we have we can show progress, and as a TODO (I think
> >> this came up in previous discussions), we could do better if we had a
> >> approximate_commit_count().
> >>
> >> In any case, the below fix seems correct to me, but I haven't poked it
> >> much. It *does* suffer from a theoretical race with the progress bar
> >> similar to d9b1b309cf ("commit-graph write: show progress for object
> >> search", 2019-01-19), but I work around it in the same way:
> >>
> >> diff --git a/commit-graph.c b/commit-graph.c
> >> index 47e9be0a3a..0fab3d8b2b 100644
> >> --- a/commit-graph.c
> >> +++ b/commit-graph.c
> >> @@ -693,7 +693,8 @@ static void add_missing_parents(struct packed_oid_list *oids, struct commit *com
> >>  	}
> >>  }
> >>
> >> -static void close_reachable(struct packed_oid_list *oids, int report_progress)
> >> +static void close_reachable(struct packed_oid_list *oids, int report_progress,
> >> +			    uint64_t oids_count_for_progress)
> >>  {
> >>  	int i;
> >>  	struct commit *commit;
> >> @@ -717,7 +718,8 @@ static void close_reachable(struct packed_oid_list *oids, int report_progress)
> >>  	 */
> >>  	if (report_progress)
> >>  		progress = start_delayed_progress(
> >> -			_("Expanding reachable commits in commit graph"), oids->nr);
> >> +			_("Expanding reachable commits in commit graph"),
> >> +			oids_count_for_progress);
> >>  	for (i = 0; i < oids->nr; i++) {
> >>  		display_progress(progress, i + 1);
> >>  		commit = lookup_commit(the_repository, &oids->list[i]);
> >> @@ -725,6 +727,8 @@ static void close_reachable(struct packed_oid_list *oids, int report_progress)
> >>  		if (commit && !parse_commit(commit))
> >>  			add_missing_parents(oids, commit);
> >>  	}
> >> +	if (oids->nr < oids_count_for_progress)
> >> +		display_progress(progress, oids_count_for_progress);
> >>  	stop_progress(&progress);
> >>
> >>  	if (report_progress)
> >> @@ -829,6 +833,7 @@ void write_commit_graph(const char *obj_dir,
> >>  	uint64_t progress_cnt = 0;
> >>  	struct strbuf progress_title = STRBUF_INIT;
> >>  	unsigned long approx_nr_objects;
> >> +	uint64_t oids_count_for_progress = 0;
> >>
> >>  	if (!commit_graph_compatible(the_repository))
> >>  		return;
> >> @@ -934,9 +939,10 @@ void write_commit_graph(const char *obj_dir,
> >>  		if (oids.progress_done < approx_nr_objects)
> >>  			display_progress(oids.progress, approx_nr_objects);
> >>  		stop_progress(&oids.progress);
> >> +		oids_count_for_progress = oids.nr;
> >>  	}
> >>
> >> -	close_reachable(&oids, report_progress);
> >> +	close_reachable(&oids, report_progress, oids_count_for_progress);
> >>
> >>  	if (report_progress)
> >>  		progress = start_delayed_progress(
> >>
SZEDER Gábor March 22, 2019, 4:52 p.m. UTC | #8
On Fri, Mar 22, 2019 at 04:49:43PM +0100, SZEDER Gábor wrote:
> On Fri, Mar 22, 2019 at 04:11:24PM +0100, Ævar Arnfjörð Bjarmason wrote:
> > 
> > On Fri, Mar 22 2019, SZEDER Gábor wrote:
> > 
> > > On Fri, Mar 22, 2019 at 03:28:34PM +0100, Ævar Arnfjörð Bjarmason wrote:
> > >>
> > >> On Fri, Mar 22 2019, SZEDER Gábor wrote:
> > >>
> > >> > On Fri, Mar 22, 2019 at 12:11:26PM +0100, Ævar Arnfjörð Bjarmason wrote:
> > >> >>
> > >> >> On Fri, Mar 22 2019, SZEDER Gábor wrote:
> > >> >>
> > >> >> > Commit 49bbc57a57 (commit-graph write: emit a percentage for all
> > >> >> > progress, 2019-01-19) was a bit overeager when it added progress
> > >> >> > percentages to the "Expanding reachable commits in commit graph" phase
> > >> >> > as well, because most of the time the number of commits that phase has
> > >> >> > to iterate over is not known in advance and grows significantly, and,
> > >> >> > consequently, we end up with nonsensical numbers:
> > >> >> >
> > >> >> >   $ git commit-graph write --reachable
> > >> >> >   Expanding reachable commits in commit graph: 138606% (824706/595), done.
> > >> >> >   [...]
> > >> >> >
> > >> >> >   $ git rev-parse v5.0 | git commit-graph write --stdin-commits
> > >> >> >   Expanding reachable commits in commit graph: 81264400% (812644/1), done.
> > >> >> >   [...]
> > >> >> >
> > >> >> > Therefore, don't show progress percentages in the "Expanding reachable
> > >> >> > commits in commit graph" phase.
> > >> >>
> > >> >> There's indeed a bug here as your examples show, but there *are* cases
> > >> >> where it's correct, as the commit message for my patch on "master" shows
> > >> >> there's cases where we correctly.
> > >> >>
> > >> >> So this "fixes" things by always removing the progress, why not instead
> > >> >> pass down the state to close_reachable() about what we're walking over,
> > >> >> so we can always show progress, or at least in some cases?
> > >> >
> > >> > The cases where it does display correct percentages are exceptional,
> > >> > and doesn't worth the effort to try to find out whether ther current
> > >> > operation happens to be such a case.
> > >>
> > >> It's the "write" entry point without arguments that displays the correct
> > >> progress. So not exceptional, but yeah, it's not what we use on "gc".
> > >
> > > Bit it displays the correct number only if all the reachable commits
> > > are in packfiles, which is not necessarily the case (e.g. unpacked
> > > small packs during 'git fetch').
> > 
> > No, argument-less "write" only considers packed commits.
> 
> No, it considers packed commits as starting points, and then expands
> to all reachable commits, that's what that loop in question is about.
> 
>   $ git init
>   Initialized empty Git repository in /tmp/test/.git/
>   $ echo >file
>   $ git add file
>   $ git commit -q -m initial
>   $ echo 1 >file
>   $ git commit -q -m 1 file
>   $ git rev-parse HEAD | git pack-objects
>   .git/objects/pack/pack
>   Enumerating objects: 1, done.
>   Counting objects: 100% (1/1), done.
>   ece3ff72952af2b28e048fa5c58db88c44312876
>   Writing objects: 100% (1/1), done.
>   Total 1 (delta 0), reused 0 (delta 0)
>   $ git commit-graph write
>   Computing commit graph generation numbers: 100% (2/2), done.
> 
> 
> > >> In any case, the problem is that sometimes we've walked the full set of
> > >> commits already, and some other times we haven't.
> > >
> > > ... and that we can't really be sure whether we've walked the full set
> > > of commits until after this loop.
> > 
> > I'm fairly sure we can when we start with a full walk. See my
> > explanation in <87imwbc6x8.fsf@evledraar.gmail.com>, but I may have
> > missed something.

BTW, if we know for _sure_ that we started with the full set of
commits, then we don't have to iterate over all the history in this
"Expanding reachable commits..." phase in the first place, because
there is nowhere to expand anyway.  Maybe we wouldn't even have to
call close_reachable() in that case.
Ævar Arnfjörð Bjarmason March 22, 2019, 5:23 p.m. UTC | #9
On Fri, Mar 22 2019, SZEDER Gábor wrote:

> On Fri, Mar 22, 2019 at 04:11:24PM +0100, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Fri, Mar 22 2019, SZEDER Gábor wrote:
>>
>> > On Fri, Mar 22, 2019 at 03:28:34PM +0100, Ævar Arnfjörð Bjarmason wrote:
>> >>
>> >> On Fri, Mar 22 2019, SZEDER Gábor wrote:
>> >>
>> >> > On Fri, Mar 22, 2019 at 12:11:26PM +0100, Ævar Arnfjörð Bjarmason wrote:
>> >> >>
>> >> >> On Fri, Mar 22 2019, SZEDER Gábor wrote:
>> >> >>
>> >> >> > Commit 49bbc57a57 (commit-graph write: emit a percentage for all
>> >> >> > progress, 2019-01-19) was a bit overeager when it added progress
>> >> >> > percentages to the "Expanding reachable commits in commit graph" phase
>> >> >> > as well, because most of the time the number of commits that phase has
>> >> >> > to iterate over is not known in advance and grows significantly, and,
>> >> >> > consequently, we end up with nonsensical numbers:
>> >> >> >
>> >> >> >   $ git commit-graph write --reachable
>> >> >> >   Expanding reachable commits in commit graph: 138606% (824706/595), done.
>> >> >> >   [...]
>> >> >> >
>> >> >> >   $ git rev-parse v5.0 | git commit-graph write --stdin-commits
>> >> >> >   Expanding reachable commits in commit graph: 81264400% (812644/1), done.
>> >> >> >   [...]
>> >> >> >
>> >> >> > Therefore, don't show progress percentages in the "Expanding reachable
>> >> >> > commits in commit graph" phase.
>> >> >>
>> >> >> There's indeed a bug here as your examples show, but there *are* cases
>> >> >> where it's correct, as the commit message for my patch on "master" shows
>> >> >> there's cases where we correctly.
>> >> >>
>> >> >> So this "fixes" things by always removing the progress, why not instead
>> >> >> pass down the state to close_reachable() about what we're walking over,
>> >> >> so we can always show progress, or at least in some cases?
>> >> >
>> >> > The cases where it does display correct percentages are exceptional,
>> >> > and doesn't worth the effort to try to find out whether ther current
>> >> > operation happens to be such a case.
>> >>
>> >> It's the "write" entry point without arguments that displays the correct
>> >> progress. So not exceptional, but yeah, it's not what we use on "gc".
>> >
>> > Bit it displays the correct number only if all the reachable commits
>> > are in packfiles, which is not necessarily the case (e.g. unpacked
>> > small packs during 'git fetch').
>>
>> No, argument-less "write" only considers packed commits.
>
> No, it considers packed commits as starting points, and then expands
> to all reachable commits, that's what that loop in question is about.
>
>   $ git init
>   Initialized empty Git repository in /tmp/test/.git/
>   $ echo >file
>   $ git add file
>   $ git commit -q -m initial
>   $ echo 1 >file
>   $ git commit -q -m 1 file
>   $ git rev-parse HEAD | git pack-objects
>   .git/objects/pack/pack
>   Enumerating objects: 1, done.
>   Counting objects: 100% (1/1), done.
>   ece3ff72952af2b28e048fa5c58db88c44312876
>   Writing objects: 100% (1/1), done.
>   Total 1 (delta 0), reused 0 (delta 0)
>   $ git commit-graph write
>   Computing commit graph generation numbers: 100% (2/2), done.

Ah, indeed. I think in practice it'll be unlikely to happen in practice
except on servers due to receive.unpackLimit, and then it won't be off
by much.

So I think it's best to do something like my WIP patch upthread + the
"snap to 100%" at the end behavior. I'll try to clean that up / test it
/ submit that sometime soon.

>> >> In any case, the problem is that sometimes we've walked the full set of
>> >> commits already, and some other times we haven't.
>> >
>> > ... and that we can't really be sure whether we've walked the full set
>> > of commits until after this loop.
>>
>> I'm fairly sure we can when we start with a full walk. See my
>> explanation in <87imwbc6x8.fsf@evledraar.gmail.com>, but I may have
>> missed something.
>>
>> >> So in cases where we have we can show progress, and as a TODO (I think
>> >> this came up in previous discussions), we could do better if we had a
>> >> approximate_commit_count().
>> >>
>> >> In any case, the below fix seems correct to me, but I haven't poked it
>> >> much. It *does* suffer from a theoretical race with the progress bar
>> >> similar to d9b1b309cf ("commit-graph write: show progress for object
>> >> search", 2019-01-19), but I work around it in the same way:
>> >>
>> >> diff --git a/commit-graph.c b/commit-graph.c
>> >> index 47e9be0a3a..0fab3d8b2b 100644
>> >> --- a/commit-graph.c
>> >> +++ b/commit-graph.c
>> >> @@ -693,7 +693,8 @@ static void add_missing_parents(struct packed_oid_list *oids, struct commit *com
>> >>  	}
>> >>  }
>> >>
>> >> -static void close_reachable(struct packed_oid_list *oids, int report_progress)
>> >> +static void close_reachable(struct packed_oid_list *oids, int report_progress,
>> >> +			    uint64_t oids_count_for_progress)
>> >>  {
>> >>  	int i;
>> >>  	struct commit *commit;
>> >> @@ -717,7 +718,8 @@ static void close_reachable(struct packed_oid_list *oids, int report_progress)
>> >>  	 */
>> >>  	if (report_progress)
>> >>  		progress = start_delayed_progress(
>> >> -			_("Expanding reachable commits in commit graph"), oids->nr);
>> >> +			_("Expanding reachable commits in commit graph"),
>> >> +			oids_count_for_progress);
>> >>  	for (i = 0; i < oids->nr; i++) {
>> >>  		display_progress(progress, i + 1);
>> >>  		commit = lookup_commit(the_repository, &oids->list[i]);
>> >> @@ -725,6 +727,8 @@ static void close_reachable(struct packed_oid_list *oids, int report_progress)
>> >>  		if (commit && !parse_commit(commit))
>> >>  			add_missing_parents(oids, commit);
>> >>  	}
>> >> +	if (oids->nr < oids_count_for_progress)
>> >> +		display_progress(progress, oids_count_for_progress);
>> >>  	stop_progress(&progress);
>> >>
>> >>  	if (report_progress)
>> >> @@ -829,6 +833,7 @@ void write_commit_graph(const char *obj_dir,
>> >>  	uint64_t progress_cnt = 0;
>> >>  	struct strbuf progress_title = STRBUF_INIT;
>> >>  	unsigned long approx_nr_objects;
>> >> +	uint64_t oids_count_for_progress = 0;
>> >>
>> >>  	if (!commit_graph_compatible(the_repository))
>> >>  		return;
>> >> @@ -934,9 +939,10 @@ void write_commit_graph(const char *obj_dir,
>> >>  		if (oids.progress_done < approx_nr_objects)
>> >>  			display_progress(oids.progress, approx_nr_objects);
>> >>  		stop_progress(&oids.progress);
>> >> +		oids_count_for_progress = oids.nr;
>> >>  	}
>> >>
>> >> -	close_reachable(&oids, report_progress);
>> >> +	close_reachable(&oids, report_progress, oids_count_for_progress);
>> >>
>> >>  	if (report_progress)
>> >>  		progress = start_delayed_progress(
>> >>

Patch
diff mbox series

diff --git a/commit-graph.c b/commit-graph.c
index 017225ccea..60c06ce58f 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -672,13 +672,13 @@  static void close_reachable(struct packed_oid_list *oids, int report_progress)
 	 * As this loop runs, oids->nr may grow, but not more
 	 * than the number of missing commits in the reachable
 	 * closure.
 	 */
 	if (report_progress)
 		progress = start_delayed_progress(
-			_("Expanding reachable commits in commit graph"), oids->nr);
+			_("Expanding reachable commits in commit graph"), 0);
 	for (i = 0; i < oids->nr; i++) {
 		display_progress(progress, i + 1);
 		commit = lookup_commit(the_repository, &oids->list[i]);
 
 		if (commit && !parse_commit(commit))
 			add_missing_parents(oids, commit);