diff mbox series

[3/3] entry: show finer-grained counter in "Filtering content" progress line

Message ID patch-3.3-f65001627a-20210722T121801Z-avarab@gmail.com (mailing list archive)
State Superseded
Headers show
Series progress.c API users: fix bogus counting | expand

Commit Message

Ævar Arnfjörð Bjarmason July 22, 2021, 12:20 p.m. UTC
From: SZEDER Gábor <szeder.dev@gmail.com>

The "Filtering content" progress in entry.c:finish_delayed_checkout()
is unusual because of how it calculates the progress count and because
it shows the progress of a nested loop.  It works basically like this:

  start_delayed_progress(p, nr_of_paths_to_filter)
  for_each_filter {
      display_progress(p, nr_of_paths_to_filter - nr_of_paths_still_left_to_filter)
      for_each_path_handled_by_the_current_filter {
          checkout_entry()
      }
  }
  stop_progress(p)

There are two issues with this approach:

  - The work done by the last filter (or the only filter if there is
    only one) is never counted, so if the last filter still has some
    paths to process, then the counter shown in the "done" progress
    line will not match the expected total.

    This would cause a BUG() in an upcoming change that adds an
    assertion checking if the "total" at the end matches the last
    progress bar update..

    This is because both use only one filter.  (The test 'delayed
    checkout in process filter' uses two filters but the first one
    does all the work, so that test already happens to succeed even
    with such an assertion.)

  - The progress counter is updated only once per filter, not once per
    processed path, so if a filter has a lot of paths to process, then
    the counter might stay unchanged for a long while and then make a
    big jump (though the user still gets a sense of progress, because
    we call display_throughput() after each processed path to show the
    amount of processed data).

Move the display_progress() call to the inner loop, right next to that
checkout_entry() call that does the hard work for each path, and use a
dedicated counter variable that is incremented upon processing each
path.

After this change the 'invalid file in delayed checkout' in
't0021-conversion.sh' would succeed with the future BUG() assertion
discussed above but the 'missing file in delayed checkout' test would
still fail, because its purposefully buggy filter doesn't process any
paths, so we won't execute that inner loop at all (this will be fixed
in a subsequent commit).

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 entry.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

Comments

Junio C Hamano July 23, 2021, 10:01 p.m. UTC | #1
Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> From: SZEDER Gábor <szeder.dev@gmail.com>
>
> The "Filtering content" progress in entry.c:finish_delayed_checkout()
> is unusual because of how it calculates the progress count and because
> it shows the progress of a nested loop.  It works basically like this:
>
>   start_delayed_progress(p, nr_of_paths_to_filter)
>   for_each_filter {
>       display_progress(p, nr_of_paths_to_filter - nr_of_paths_still_left_to_filter)
>       for_each_path_handled_by_the_current_filter {
>           checkout_entry()
>       }
>   }
>   stop_progress(p)
>
> There are two issues with this approach:
>
>   - The work done by the last filter (or the only filter if there is
>     only one) is never counted, so if the last filter still has some
>     paths to process, then the counter shown in the "done" progress
>     line will not match the expected total.
>
>     This would cause a BUG() in an upcoming change that adds an
>     assertion checking if the "total" at the end matches the last
>     progress bar update..

So the other series will semantically depend on this 3-patch series?
Just making sure that is the intended topic structure.

> diff --git a/entry.c b/entry.c
> index 125fabdbd5..d92dd020b3 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -162,7 +162,7 @@ static int remove_available_paths(struct string_list_item *item, void *cb_data)
>  int finish_delayed_checkout(struct checkout *state, int *nr_checkouts)
>  {
>  	int errs = 0;
> -	unsigned delayed_object_count;
> +	unsigned processed_paths = 0;
>  	off_t filtered_bytes = 0;
>  	struct string_list_item *filter, *path;
>  	struct progress *progress;
> @@ -172,12 +172,10 @@ int finish_delayed_checkout(struct checkout *state, int *nr_checkouts)
>  		return errs;
>  
>  	dco->state = CE_RETRY;
> -	delayed_object_count = dco->paths.nr;
> -	progress = start_delayed_progress(_("Filtering content"), delayed_object_count);
> +	progress = start_delayed_progress(_("Filtering content"), dco->paths.nr);
>  	while (dco->filters.nr > 0) {
>  		for_each_string_list_item(filter, &dco->filters) {
>  			struct string_list available_paths = STRING_LIST_INIT_NODUP;
> -			display_progress(progress, delayed_object_count - dco->paths.nr);
>  
>  			if (!async_query_available_blobs(filter->string, &available_paths)) {
>  				/* Filter reported an error */
> @@ -224,6 +222,7 @@ int finish_delayed_checkout(struct checkout *state, int *nr_checkouts)
>  				ce = index_file_exists(state->istate, path->string,
>  						       strlen(path->string), 0);
>  				if (ce) {
> +					display_progress(progress, ++processed_paths);
>  					errs |= checkout_entry(ce, state, NULL, nr_checkouts);
>  					filtered_bytes += ce->ce_stat_data.sd_size;
>  					display_throughput(progress, filtered_bytes);

Hmph.  A missing cache entries will not increment processed; would
that cause stop_progress() to see at the end the counter that is
smaller than dco->paths.nr we saw at the beginning?
SZEDER Gábor Aug. 2, 2021, 9:48 p.m. UTC | #2
On Thu, Jul 22, 2021 at 02:20:17PM +0200, Ævar Arnfjörð Bjarmason wrote:
> From: SZEDER Gábor <szeder.dev@gmail.com>
> 
> The "Filtering content" progress in entry.c:finish_delayed_checkout()
> is unusual because of how it calculates the progress count and because
> it shows the progress of a nested loop.  It works basically like this:
> 
>   start_delayed_progress(p, nr_of_paths_to_filter)
>   for_each_filter {
>       display_progress(p, nr_of_paths_to_filter - nr_of_paths_still_left_to_filter)
>       for_each_path_handled_by_the_current_filter {
>           checkout_entry()
>       }
>   }
>   stop_progress(p)
> 
> There are two issues with this approach:
> 
>   - The work done by the last filter (or the only filter if there is
>     only one) is never counted, so if the last filter still has some
>     paths to process, then the counter shown in the "done" progress
>     line will not match the expected total.
> 
>     This would cause a BUG() in an upcoming change that adds an
>     assertion checking if the "total" at the end matches the last
>     progress bar update..
> 
>     This is because both

"Both" what?

>     use only one filter.  (The test 'delayed
>     checkout in process filter' uses two filters but the first one
>     does all the work, so that test already happens to succeed even
>     with such an assertion.)
> 
>   - The progress counter is updated only once per filter, not once per
>     processed path, so if a filter has a lot of paths to process, then
>     the counter might stay unchanged for a long while and then make a
>     big jump (though the user still gets a sense of progress, because
>     we call display_throughput() after each processed path to show the
>     amount of processed data).
> 
> Move the display_progress() call to the inner loop, right next to that
> checkout_entry() call that does the hard work for each path, and use a
> dedicated counter variable that is incremented upon processing each
> path.
> 
> After this change the 'invalid file in delayed checkout' in
> 't0021-conversion.sh' would succeed with the future BUG() assertion
> discussed above but the 'missing file in delayed checkout' test would
> still fail, because its purposefully buggy filter doesn't process any
> paths, so we won't execute that inner loop at all (this will be fixed
> in a subsequent commit).

I don't like how the updates to the commit message keeps referring to
some future BUG().

A benefit of my original submission is that all those checks are added
at the beginning of the patch series, so when looking at these later
bugfix patches reviewers can easily run the problematic tests with
GIT_TEST_CHECK_PROGRESS=1 to see the failure themselves and to confirm
that the fix indeed works.  Without those checks the next best thing
is applying the patch below and then looking at the verbose log of
those tests:


diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index b5749f327d..93a67f2f1f 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -955,7 +955,8 @@ test_expect_success PERL 'missing file in delayed checkout' '
 	) &&
 
 	rm -rf repo-cloned &&
-	test_must_fail git clone repo repo-cloned 2>git-stderr.log &&
+	test_must_fail env GIT_PROGRESS_DELAY=0 git clone repo repo-cloned 2>git-stderr.log &&
+	cat git-stderr.log &&
 	grep "error: .missing-delay\.a. was not filtered properly" git-stderr.log
 '
 
@@ -976,7 +977,8 @@ test_expect_success PERL 'invalid file in delayed checkout' '
 	) &&
 
 	rm -rf repo-cloned &&
-	test_must_fail git clone repo repo-cloned 2>git-stderr.log &&
+	test_must_fail env GIT_PROGRESS_DELAY=0 git clone repo repo-cloned 2>git-stderr.log &&
+	cat git-stderr.log &&
 	grep "error: external filter .* signaled that .unfiltered. is now available although it has not been delayed earlier" git-stderr.log
 '
 

> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  entry.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/entry.c b/entry.c
> index 125fabdbd5..d92dd020b3 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -162,7 +162,7 @@ static int remove_available_paths(struct string_list_item *item, void *cb_data)
>  int finish_delayed_checkout(struct checkout *state, int *nr_checkouts)
>  {
>  	int errs = 0;
> -	unsigned delayed_object_count;
> +	unsigned processed_paths = 0;
>  	off_t filtered_bytes = 0;
>  	struct string_list_item *filter, *path;
>  	struct progress *progress;
> @@ -172,12 +172,10 @@ int finish_delayed_checkout(struct checkout *state, int *nr_checkouts)
>  		return errs;
>  
>  	dco->state = CE_RETRY;
> -	delayed_object_count = dco->paths.nr;
> -	progress = start_delayed_progress(_("Filtering content"), delayed_object_count);
> +	progress = start_delayed_progress(_("Filtering content"), dco->paths.nr);
>  	while (dco->filters.nr > 0) {
>  		for_each_string_list_item(filter, &dco->filters) {
>  			struct string_list available_paths = STRING_LIST_INIT_NODUP;
> -			display_progress(progress, delayed_object_count - dco->paths.nr);
>  
>  			if (!async_query_available_blobs(filter->string, &available_paths)) {
>  				/* Filter reported an error */
> @@ -224,6 +222,7 @@ int finish_delayed_checkout(struct checkout *state, int *nr_checkouts)
>  				ce = index_file_exists(state->istate, path->string,
>  						       strlen(path->string), 0);
>  				if (ce) {
> +					display_progress(progress, ++processed_paths);
>  					errs |= checkout_entry(ce, state, NULL, nr_checkouts);
>  					filtered_bytes += ce->ce_stat_data.sd_size;
>  					display_throughput(progress, filtered_bytes);
> -- 
> 2.32.0.957.gd9e39d72fe6
>
SZEDER Gábor Aug. 2, 2021, 10:05 p.m. UTC | #3
On Fri, Jul 23, 2021 at 03:01:48PM -0700, Junio C Hamano wrote:
> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
> 
> > From: SZEDER Gábor <szeder.dev@gmail.com>
> >
> > The "Filtering content" progress in entry.c:finish_delayed_checkout()
> > is unusual because of how it calculates the progress count and because
> > it shows the progress of a nested loop.  It works basically like this:
> >
> >   start_delayed_progress(p, nr_of_paths_to_filter)
> >   for_each_filter {
> >       display_progress(p, nr_of_paths_to_filter - nr_of_paths_still_left_to_filter)
> >       for_each_path_handled_by_the_current_filter {
> >           checkout_entry()
> >       }
> >   }
> >   stop_progress(p)
> >
> > There are two issues with this approach:
> >
> >   - The work done by the last filter (or the only filter if there is
> >     only one) is never counted, so if the last filter still has some
> >     paths to process, then the counter shown in the "done" progress
> >     line will not match the expected total.
> >
> >     This would cause a BUG() in an upcoming change that adds an
> >     assertion checking if the "total" at the end matches the last
> >     progress bar update..
> 
> So the other series will semantically depend on this 3-patch series?
> Just making sure that is the intended topic structure.
> 
> > diff --git a/entry.c b/entry.c
> > index 125fabdbd5..d92dd020b3 100644
> > --- a/entry.c
> > +++ b/entry.c
> > @@ -162,7 +162,7 @@ static int remove_available_paths(struct string_list_item *item, void *cb_data)
> >  int finish_delayed_checkout(struct checkout *state, int *nr_checkouts)
> >  {
> >  	int errs = 0;
> > -	unsigned delayed_object_count;
> > +	unsigned processed_paths = 0;
> >  	off_t filtered_bytes = 0;
> >  	struct string_list_item *filter, *path;
> >  	struct progress *progress;
> > @@ -172,12 +172,10 @@ int finish_delayed_checkout(struct checkout *state, int *nr_checkouts)
> >  		return errs;
> >  
> >  	dco->state = CE_RETRY;
> > -	delayed_object_count = dco->paths.nr;
> > -	progress = start_delayed_progress(_("Filtering content"), delayed_object_count);
> > +	progress = start_delayed_progress(_("Filtering content"), dco->paths.nr);
> >  	while (dco->filters.nr > 0) {
> >  		for_each_string_list_item(filter, &dco->filters) {
> >  			struct string_list available_paths = STRING_LIST_INIT_NODUP;
> > -			display_progress(progress, delayed_object_count - dco->paths.nr);
> >  
> >  			if (!async_query_available_blobs(filter->string, &available_paths)) {
> >  				/* Filter reported an error */
> > @@ -224,6 +222,7 @@ int finish_delayed_checkout(struct checkout *state, int *nr_checkouts)
> >  				ce = index_file_exists(state->istate, path->string,
> >  						       strlen(path->string), 0);
> >  				if (ce) {
> > +					display_progress(progress, ++processed_paths);
> >  					errs |= checkout_entry(ce, state, NULL, nr_checkouts);
> >  					filtered_bytes += ce->ce_stat_data.sd_size;
> >  					display_throughput(progress, filtered_bytes);
> 
> Hmph.  A missing cache entries will not increment processed; would
> that cause stop_progress() to see at the end the counter that is
> smaller than dco->paths.nr we saw at the beginning?

Yes, but this 'if (ce)' condition has an 'else errs = 1;' branch as
well, i.e. a missing cache entry is considered an error.  This patch
fixes issues with this progress bar in the case when all went well,
i.e. when there were no errors.  In my original submission it is
followed up by another patch that attempts to fix this progress line
when there are errors, arguing that it's wrong to show "done" at the
end when not all work was done because of said errors...  although
"fix" is not quite the right word for the approach taken in that
patch, it's more like "papering over" ;)

  https://public-inbox.org/git/20210620200303.2328957-7-szeder.dev@gmail.com/
diff mbox series

Patch

diff --git a/entry.c b/entry.c
index 125fabdbd5..d92dd020b3 100644
--- a/entry.c
+++ b/entry.c
@@ -162,7 +162,7 @@  static int remove_available_paths(struct string_list_item *item, void *cb_data)
 int finish_delayed_checkout(struct checkout *state, int *nr_checkouts)
 {
 	int errs = 0;
-	unsigned delayed_object_count;
+	unsigned processed_paths = 0;
 	off_t filtered_bytes = 0;
 	struct string_list_item *filter, *path;
 	struct progress *progress;
@@ -172,12 +172,10 @@  int finish_delayed_checkout(struct checkout *state, int *nr_checkouts)
 		return errs;
 
 	dco->state = CE_RETRY;
-	delayed_object_count = dco->paths.nr;
-	progress = start_delayed_progress(_("Filtering content"), delayed_object_count);
+	progress = start_delayed_progress(_("Filtering content"), dco->paths.nr);
 	while (dco->filters.nr > 0) {
 		for_each_string_list_item(filter, &dco->filters) {
 			struct string_list available_paths = STRING_LIST_INIT_NODUP;
-			display_progress(progress, delayed_object_count - dco->paths.nr);
 
 			if (!async_query_available_blobs(filter->string, &available_paths)) {
 				/* Filter reported an error */
@@ -224,6 +222,7 @@  int finish_delayed_checkout(struct checkout *state, int *nr_checkouts)
 				ce = index_file_exists(state->istate, path->string,
 						       strlen(path->string), 0);
 				if (ce) {
+					display_progress(progress, ++processed_paths);
 					errs |= checkout_entry(ce, state, NULL, nr_checkouts);
 					filtered_bytes += ce->ce_stat_data.sd_size;
 					display_throughput(progress, filtered_bytes);