Message ID | patch-3.3-f65001627a-20210722T121801Z-avarab@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | progress.c API users: fix bogus counting | expand |
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > From: SZEDER Gábor <szeder.dev@gmail.com> > > The "Filtering content" progress in entry.c:finish_delayed_checkout() > is unusual because of how it calculates the progress count and because > it shows the progress of a nested loop. It works basically like this: > > start_delayed_progress(p, nr_of_paths_to_filter) > for_each_filter { > display_progress(p, nr_of_paths_to_filter - nr_of_paths_still_left_to_filter) > for_each_path_handled_by_the_current_filter { > checkout_entry() > } > } > stop_progress(p) > > There are two issues with this approach: > > - The work done by the last filter (or the only filter if there is > only one) is never counted, so if the last filter still has some > paths to process, then the counter shown in the "done" progress > line will not match the expected total. > > This would cause a BUG() in an upcoming change that adds an > assertion checking if the "total" at the end matches the last > progress bar update.. So the other series will semantically depend on this 3-patch series? Just making sure that is the intended topic structure. > diff --git a/entry.c b/entry.c > index 125fabdbd5..d92dd020b3 100644 > --- a/entry.c > +++ b/entry.c > @@ -162,7 +162,7 @@ static int remove_available_paths(struct string_list_item *item, void *cb_data) > int finish_delayed_checkout(struct checkout *state, int *nr_checkouts) > { > int errs = 0; > - unsigned delayed_object_count; > + unsigned processed_paths = 0; > off_t filtered_bytes = 0; > struct string_list_item *filter, *path; > struct progress *progress; > @@ -172,12 +172,10 @@ int finish_delayed_checkout(struct checkout *state, int *nr_checkouts) > return errs; > > dco->state = CE_RETRY; > - delayed_object_count = dco->paths.nr; > - progress = start_delayed_progress(_("Filtering content"), delayed_object_count); > + progress = start_delayed_progress(_("Filtering content"), dco->paths.nr); > while (dco->filters.nr > 0) { > for_each_string_list_item(filter, &dco->filters) { > struct string_list available_paths = STRING_LIST_INIT_NODUP; > - display_progress(progress, delayed_object_count - dco->paths.nr); > > if (!async_query_available_blobs(filter->string, &available_paths)) { > /* Filter reported an error */ > @@ -224,6 +222,7 @@ int finish_delayed_checkout(struct checkout *state, int *nr_checkouts) > ce = index_file_exists(state->istate, path->string, > strlen(path->string), 0); > if (ce) { > + display_progress(progress, ++processed_paths); > errs |= checkout_entry(ce, state, NULL, nr_checkouts); > filtered_bytes += ce->ce_stat_data.sd_size; > display_throughput(progress, filtered_bytes); Hmph. A missing cache entries will not increment processed; would that cause stop_progress() to see at the end the counter that is smaller than dco->paths.nr we saw at the beginning?
On Thu, Jul 22, 2021 at 02:20:17PM +0200, Ævar Arnfjörð Bjarmason wrote: > From: SZEDER Gábor <szeder.dev@gmail.com> > > The "Filtering content" progress in entry.c:finish_delayed_checkout() > is unusual because of how it calculates the progress count and because > it shows the progress of a nested loop. It works basically like this: > > start_delayed_progress(p, nr_of_paths_to_filter) > for_each_filter { > display_progress(p, nr_of_paths_to_filter - nr_of_paths_still_left_to_filter) > for_each_path_handled_by_the_current_filter { > checkout_entry() > } > } > stop_progress(p) > > There are two issues with this approach: > > - The work done by the last filter (or the only filter if there is > only one) is never counted, so if the last filter still has some > paths to process, then the counter shown in the "done" progress > line will not match the expected total. > > This would cause a BUG() in an upcoming change that adds an > assertion checking if the "total" at the end matches the last > progress bar update.. > > This is because both "Both" what? > use only one filter. (The test 'delayed > checkout in process filter' uses two filters but the first one > does all the work, so that test already happens to succeed even > with such an assertion.) > > - The progress counter is updated only once per filter, not once per > processed path, so if a filter has a lot of paths to process, then > the counter might stay unchanged for a long while and then make a > big jump (though the user still gets a sense of progress, because > we call display_throughput() after each processed path to show the > amount of processed data). > > Move the display_progress() call to the inner loop, right next to that > checkout_entry() call that does the hard work for each path, and use a > dedicated counter variable that is incremented upon processing each > path. > > After this change the 'invalid file in delayed checkout' in > 't0021-conversion.sh' would succeed with the future BUG() assertion > discussed above but the 'missing file in delayed checkout' test would > still fail, because its purposefully buggy filter doesn't process any > paths, so we won't execute that inner loop at all (this will be fixed > in a subsequent commit). I don't like how the updates to the commit message keeps referring to some future BUG(). A benefit of my original submission is that all those checks are added at the beginning of the patch series, so when looking at these later bugfix patches reviewers can easily run the problematic tests with GIT_TEST_CHECK_PROGRESS=1 to see the failure themselves and to confirm that the fix indeed works. Without those checks the next best thing is applying the patch below and then looking at the verbose log of those tests: diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh index b5749f327d..93a67f2f1f 100755 --- a/t/t0021-conversion.sh +++ b/t/t0021-conversion.sh @@ -955,7 +955,8 @@ test_expect_success PERL 'missing file in delayed checkout' ' ) && rm -rf repo-cloned && - test_must_fail git clone repo repo-cloned 2>git-stderr.log && + test_must_fail env GIT_PROGRESS_DELAY=0 git clone repo repo-cloned 2>git-stderr.log && + cat git-stderr.log && grep "error: .missing-delay\.a. was not filtered properly" git-stderr.log ' @@ -976,7 +977,8 @@ test_expect_success PERL 'invalid file in delayed checkout' ' ) && rm -rf repo-cloned && - test_must_fail git clone repo repo-cloned 2>git-stderr.log && + test_must_fail env GIT_PROGRESS_DELAY=0 git clone repo repo-cloned 2>git-stderr.log && + cat git-stderr.log && grep "error: external filter .* signaled that .unfiltered. is now available although it has not been delayed earlier" git-stderr.log ' > Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> > --- > entry.c | 7 +++---- > 1 file changed, 3 insertions(+), 4 deletions(-) > > diff --git a/entry.c b/entry.c > index 125fabdbd5..d92dd020b3 100644 > --- a/entry.c > +++ b/entry.c > @@ -162,7 +162,7 @@ static int remove_available_paths(struct string_list_item *item, void *cb_data) > int finish_delayed_checkout(struct checkout *state, int *nr_checkouts) > { > int errs = 0; > - unsigned delayed_object_count; > + unsigned processed_paths = 0; > off_t filtered_bytes = 0; > struct string_list_item *filter, *path; > struct progress *progress; > @@ -172,12 +172,10 @@ int finish_delayed_checkout(struct checkout *state, int *nr_checkouts) > return errs; > > dco->state = CE_RETRY; > - delayed_object_count = dco->paths.nr; > - progress = start_delayed_progress(_("Filtering content"), delayed_object_count); > + progress = start_delayed_progress(_("Filtering content"), dco->paths.nr); > while (dco->filters.nr > 0) { > for_each_string_list_item(filter, &dco->filters) { > struct string_list available_paths = STRING_LIST_INIT_NODUP; > - display_progress(progress, delayed_object_count - dco->paths.nr); > > if (!async_query_available_blobs(filter->string, &available_paths)) { > /* Filter reported an error */ > @@ -224,6 +222,7 @@ int finish_delayed_checkout(struct checkout *state, int *nr_checkouts) > ce = index_file_exists(state->istate, path->string, > strlen(path->string), 0); > if (ce) { > + display_progress(progress, ++processed_paths); > errs |= checkout_entry(ce, state, NULL, nr_checkouts); > filtered_bytes += ce->ce_stat_data.sd_size; > display_throughput(progress, filtered_bytes); > -- > 2.32.0.957.gd9e39d72fe6 >
On Fri, Jul 23, 2021 at 03:01:48PM -0700, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > > > From: SZEDER Gábor <szeder.dev@gmail.com> > > > > The "Filtering content" progress in entry.c:finish_delayed_checkout() > > is unusual because of how it calculates the progress count and because > > it shows the progress of a nested loop. It works basically like this: > > > > start_delayed_progress(p, nr_of_paths_to_filter) > > for_each_filter { > > display_progress(p, nr_of_paths_to_filter - nr_of_paths_still_left_to_filter) > > for_each_path_handled_by_the_current_filter { > > checkout_entry() > > } > > } > > stop_progress(p) > > > > There are two issues with this approach: > > > > - The work done by the last filter (or the only filter if there is > > only one) is never counted, so if the last filter still has some > > paths to process, then the counter shown in the "done" progress > > line will not match the expected total. > > > > This would cause a BUG() in an upcoming change that adds an > > assertion checking if the "total" at the end matches the last > > progress bar update.. > > So the other series will semantically depend on this 3-patch series? > Just making sure that is the intended topic structure. > > > diff --git a/entry.c b/entry.c > > index 125fabdbd5..d92dd020b3 100644 > > --- a/entry.c > > +++ b/entry.c > > @@ -162,7 +162,7 @@ static int remove_available_paths(struct string_list_item *item, void *cb_data) > > int finish_delayed_checkout(struct checkout *state, int *nr_checkouts) > > { > > int errs = 0; > > - unsigned delayed_object_count; > > + unsigned processed_paths = 0; > > off_t filtered_bytes = 0; > > struct string_list_item *filter, *path; > > struct progress *progress; > > @@ -172,12 +172,10 @@ int finish_delayed_checkout(struct checkout *state, int *nr_checkouts) > > return errs; > > > > dco->state = CE_RETRY; > > - delayed_object_count = dco->paths.nr; > > - progress = start_delayed_progress(_("Filtering content"), delayed_object_count); > > + progress = start_delayed_progress(_("Filtering content"), dco->paths.nr); > > while (dco->filters.nr > 0) { > > for_each_string_list_item(filter, &dco->filters) { > > struct string_list available_paths = STRING_LIST_INIT_NODUP; > > - display_progress(progress, delayed_object_count - dco->paths.nr); > > > > if (!async_query_available_blobs(filter->string, &available_paths)) { > > /* Filter reported an error */ > > @@ -224,6 +222,7 @@ int finish_delayed_checkout(struct checkout *state, int *nr_checkouts) > > ce = index_file_exists(state->istate, path->string, > > strlen(path->string), 0); > > if (ce) { > > + display_progress(progress, ++processed_paths); > > errs |= checkout_entry(ce, state, NULL, nr_checkouts); > > filtered_bytes += ce->ce_stat_data.sd_size; > > display_throughput(progress, filtered_bytes); > > Hmph. A missing cache entries will not increment processed; would > that cause stop_progress() to see at the end the counter that is > smaller than dco->paths.nr we saw at the beginning? Yes, but this 'if (ce)' condition has an 'else errs = 1;' branch as well, i.e. a missing cache entry is considered an error. This patch fixes issues with this progress bar in the case when all went well, i.e. when there were no errors. In my original submission it is followed up by another patch that attempts to fix this progress line when there are errors, arguing that it's wrong to show "done" at the end when not all work was done because of said errors... although "fix" is not quite the right word for the approach taken in that patch, it's more like "papering over" ;) https://public-inbox.org/git/20210620200303.2328957-7-szeder.dev@gmail.com/
diff --git a/entry.c b/entry.c index 125fabdbd5..d92dd020b3 100644 --- a/entry.c +++ b/entry.c @@ -162,7 +162,7 @@ static int remove_available_paths(struct string_list_item *item, void *cb_data) int finish_delayed_checkout(struct checkout *state, int *nr_checkouts) { int errs = 0; - unsigned delayed_object_count; + unsigned processed_paths = 0; off_t filtered_bytes = 0; struct string_list_item *filter, *path; struct progress *progress; @@ -172,12 +172,10 @@ int finish_delayed_checkout(struct checkout *state, int *nr_checkouts) return errs; dco->state = CE_RETRY; - delayed_object_count = dco->paths.nr; - progress = start_delayed_progress(_("Filtering content"), delayed_object_count); + progress = start_delayed_progress(_("Filtering content"), dco->paths.nr); while (dco->filters.nr > 0) { for_each_string_list_item(filter, &dco->filters) { struct string_list available_paths = STRING_LIST_INIT_NODUP; - display_progress(progress, delayed_object_count - dco->paths.nr); if (!async_query_available_blobs(filter->string, &available_paths)) { /* Filter reported an error */ @@ -224,6 +222,7 @@ int finish_delayed_checkout(struct checkout *state, int *nr_checkouts) ce = index_file_exists(state->istate, path->string, strlen(path->string), 0); if (ce) { + display_progress(progress, ++processed_paths); errs |= checkout_entry(ce, state, NULL, nr_checkouts); filtered_bytes += ce->ce_stat_data.sd_size; display_throughput(progress, filtered_bytes);