diff mbox series

[v3,10/10] progress.c: add & assert a "global_progress" variable

Message ID patch-v3-10.10-01d5bbfce76-20211013T222329Z-avarab@gmail.com (mailing list archive)
State Superseded
Headers show
Series progress: assert "global_progress" + test fixes / cleanup | expand

Commit Message

Ævar Arnfjörð Bjarmason Oct. 13, 2021, 10:28 p.m. UTC
The progress.c code makes a hard assumption that only one progress bar
be active at a time (see [1] for a bug where this wasn't the
case). Add a BUG() that'll trigger if we ever regress on that promise
and have two progress bars active at the same time.

There was an alternative test-only approach to doing the same
thing[2], but by doing this outside of a GIT_TEST_* mode we'll know
we've put a hard stop to this particular API misuse.

It will also establish scaffolding to address current fundamental
limitations in the progress output: The current output must be
"driven" by calls to the likes of display_progress(). Once we have a
global current progress object we'll be able to update that object via
SIGALRM. See [3] for early code to do that.

It's conceivable that this change will hit the BUG() condition in some
scenario that we don't currently have tests for, this would be very
bad. If that happened we'd die just because we couldn't emit some
pretty output.

See [4] for a discussion of why our test coverage is lacking; our
progress display is hidden behind isatty(2) checks in many cases, so
the test suite doesn't cover it unless individual tests are run in
"--verbose" mode, we might also have multi-threaded use of the API, so
two progress bars stopping and starting would only be visible due to a
race condition.

Despite that, I think that this change won't introduce such
regressions, because:

 1. I've read all the code using the progress API (and have modified a
    large part of it in some WIP code I have). Almost all of it is really
    simple, the parts that aren't[5] are complex in the display_progress() part,
    not in starting or stopping the progress bar.

 2. The entire test suite passes when instrumented with an ad-hoc
    Linux-specific mode (it uses gettid()) to die if progress bars are
    ever started or stopped on anything but the main thread[6].

    Extending that to die if display_progress() is called in a thread
    reveals that we have exactly two users of the progress bar under
    threaded conditions, "git index-pack" and "git pack-objects". Both
    uses are straightforward, and they don't start/stop the progress
    bar when threads are active.

 3. I've likewise done an ad-hoc test to force progress bars to be
    displayed with:

        perl -pi -e 's[isatty\(2\)][1]g' $(git grep -l -F 'isatty(2)')

    I.e. to replace all checks (not just for progress) of checking
    whether STDERR is connected to a TTY, and then monkeypatching
    is_foreground_fd() in progress.c to always "return 1". Running the
    tests with those applied, interactively and under -V reveals via:

        $ grep -e set_progress_signal -e clear_progress_signal test-results/*out

    That nothing our tests cover hits the BUG conditions added here,
    except the expected "BUG: start two concurrent progress bars" test
    being added here.

    That isn't entirely true since we won't be getting 100% coverage
    due to cascading failures from tests that expected no progress
    output on stderr. To make sure I covered 100% I also tried making
    the display() function in progress.c a NOOP on top of that (it's
    the calls to start_progress_delay() and stop_progress()) that
    matter.

    That doesn't hit the BUG() either. Some tests fail in that mode
    due to a combination of the overzealous isatty(2) munging noted
    above, and the tests that are testing that the progress output
    itself is present (but for testing I'd made display() a NOOP).

Between those three points I think it's safe to go ahead with this
change.

1. 6f9d5f2fda1 (commit-graph: fix progress of reachable commits, 2020-07-09)
2. https://lore.kernel.org/git/20210620200303.2328957-3-szeder.dev@gmail.com
3. https://lore.kernel.org/git/patch-18.25-e21fc66623f-20210623T155626Z-avarab@gmail.com/
4. https://lore.kernel.org/git/cover-00.25-00000000000-20210623T155626Z-avarab@gmail.com/
5. b50c37aa44d (Merge branch 'ab/progress-users-adjust-counters' into
   next, 2021-09-10)
6. https://lore.kernel.org/git/877dffg37n.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 progress.c                  | 18 ++++++++++++++----
 t/t0500-progress-display.sh | 11 +++++++++++
 2 files changed, 25 insertions(+), 4 deletions(-)

Comments

Emily Shaffer Oct. 22, 2021, 3:42 a.m. UTC | #1
On Thu, Oct 14, 2021 at 12:28:26AM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> The progress.c code makes a hard assumption that only one progress bar
> be active at a time (see [1] for a bug where this wasn't the
> case). Add a BUG() that'll trigger if we ever regress on that promise
> and have two progress bars active at the same time.
> 
> There was an alternative test-only approach to doing the same
> thing[2], but by doing this outside of a GIT_TEST_* mode we'll know
> we've put a hard stop to this particular API misuse.
> 
> It will also establish scaffolding to address current fundamental
> limitations in the progress output: The current output must be
> "driven" by calls to the likes of display_progress(). Once we have a
> global current progress object we'll be able to update that object via
> SIGALRM. See [3] for early code to do that.
> 
> It's conceivable that this change will hit the BUG() condition in some
> scenario that we don't currently have tests for, this would be very
> bad. If that happened we'd die just because we couldn't emit some
> pretty output.
> 
> See [4] for a discussion of why our test coverage is lacking; our
> progress display is hidden behind isatty(2) checks in many cases, so
> the test suite doesn't cover it unless individual tests are run in
> "--verbose" mode, we might also have multi-threaded use of the API, so
> two progress bars stopping and starting would only be visible due to a
> race condition.
> 
> Despite that, I think that this change won't introduce such
> regressions, because:
> 
>  1. I've read all the code using the progress API (and have modified a
>     large part of it in some WIP code I have). Almost all of it is really
>     simple, the parts that aren't[5] are complex in the display_progress() part,
>     not in starting or stopping the progress bar.
> 
>  2. The entire test suite passes when instrumented with an ad-hoc
>     Linux-specific mode (it uses gettid()) to die if progress bars are
>     ever started or stopped on anything but the main thread[6].
> 
>     Extending that to die if display_progress() is called in a thread
>     reveals that we have exactly two users of the progress bar under
>     threaded conditions, "git index-pack" and "git pack-objects". Both
>     uses are straightforward, and they don't start/stop the progress
>     bar when threads are active.
> 
>  3. I've likewise done an ad-hoc test to force progress bars to be
>     displayed with:
> 
>         perl -pi -e 's[isatty\(2\)][1]g' $(git grep -l -F 'isatty(2)')
> 
>     I.e. to replace all checks (not just for progress) of checking
>     whether STDERR is connected to a TTY, and then monkeypatching
>     is_foreground_fd() in progress.c to always "return 1". Running the
>     tests with those applied, interactively and under -V reveals via:
> 
>         $ grep -e set_progress_signal -e clear_progress_signal test-results/*out
> 
>     That nothing our tests cover hits the BUG conditions added here,
>     except the expected "BUG: start two concurrent progress bars" test
>     being added here.
> 
>     That isn't entirely true since we won't be getting 100% coverage
>     due to cascading failures from tests that expected no progress
>     output on stderr. To make sure I covered 100% I also tried making
>     the display() function in progress.c a NOOP on top of that (it's
>     the calls to start_progress_delay() and stop_progress()) that
>     matter.
> 
>     That doesn't hit the BUG() either. Some tests fail in that mode
>     due to a combination of the overzealous isatty(2) munging noted
>     above, and the tests that are testing that the progress output
>     itself is present (but for testing I'd made display() a NOOP).
> 
> Between those three points I think it's safe to go ahead with this
> change.
> 
> 1. 6f9d5f2fda1 (commit-graph: fix progress of reachable commits, 2020-07-09)
> 2. https://lore.kernel.org/git/20210620200303.2328957-3-szeder.dev@gmail.com
> 3. https://lore.kernel.org/git/patch-18.25-e21fc66623f-20210623T155626Z-avarab@gmail.com/
> 4. https://lore.kernel.org/git/cover-00.25-00000000000-20210623T155626Z-avarab@gmail.com/
> 5. b50c37aa44d (Merge branch 'ab/progress-users-adjust-counters' into
>    next, 2021-09-10)
> 6. https://lore.kernel.org/git/877dffg37n.fsf@evledraar.gmail.com/
> 
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>

I find it much nicer to understand now, thanks. The BUG() change in
particular is excellent.

Reviewed-by: Emily Shaffer <emilyshaffer@google.com>
SZEDER Gábor Oct. 25, 2021, 5:02 a.m. UTC | #2
On Thu, Oct 14, 2021 at 12:28:26AM +0200, Ævar Arnfjörð Bjarmason wrote:
> The progress.c code makes a hard assumption that only one progress bar
> be active at a time (see [1] for a bug where this wasn't the
> case). Add a BUG() that'll trigger if we ever regress on that promise
> and have two progress bars active at the same time.

I still very much dislike the idea of a BUG() in the progress code
that can trigger outside of the test suite, because the progress line
is only a UI gimmick and not a crucial part of any Git operation, and
even though a progress line might be buggy, the underlying Git
operation is not affected by it and would still finish successfully,
as was the case with the dozen of so progress line bugs in the past.

> There was an alternative test-only approach to doing the same
> thing[2], but by doing this outside of a GIT_TEST_* mode we'll know
> we've put a hard stop to this particular API misuse.
> 
> It will also establish scaffolding to address current fundamental
> limitations in the progress output: The current output must be
> "driven" by calls to the likes of display_progress().

Please elaborate why that is a "fundamental limitation"; I don't see
any drawback of the current approach.

> Once we have a
> global current progress object we'll be able to update that object via
> SIGALRM.

What are the supposed benefits of doing that?  I do see its drawbacks,
considering that we have progress lines that are updated from multiple
threads.

> See [3] for early code to do that.
> 
> It's conceivable that this change will hit the BUG() condition in some
> scenario that we don't currently have tests for, this would be very
> bad. If that happened we'd die just because we couldn't emit some
> pretty output.
> 
> See [4] for a discussion of why our test coverage is lacking; our
> progress display is hidden behind isatty(2) checks in many cases, so
> the test suite doesn't cover it unless individual tests are run in
> "--verbose" mode, we might also have multi-threaded use of the API, so
> two progress bars stopping and starting would only be visible due to a
> race condition.
> 
> Despite that, I think that this change won't introduce such
> regressions, because:
> 
>  1. I've read all the code using the progress API (and have modified a
>     large part of it in some WIP code I have). Almost all of it is really
>     simple, the parts that aren't[5] are complex in the display_progress() part,
>     not in starting or stopping the progress bar.
> 
>  2. The entire test suite passes when instrumented with an ad-hoc
>     Linux-specific mode (it uses gettid()) to die if progress bars are
>     ever started or stopped on anything but the main thread[6].
> 
>     Extending that to die if display_progress() is called in a thread
>     reveals that we have exactly two users of the progress bar under
>     threaded conditions, "git index-pack" and "git pack-objects". Both
>     uses are straightforward, and they don't start/stop the progress
>     bar when threads are active.
> 
>  3. I've likewise done an ad-hoc test to force progress bars to be
>     displayed with:
> 
>         perl -pi -e 's[isatty\(2\)][1]g' $(git grep -l -F 'isatty(2)')
> 
>     I.e. to replace all checks (not just for progress) of checking
>     whether STDERR is connected to a TTY, and then monkeypatching
>     is_foreground_fd() in progress.c to always "return 1". Running the
>     tests with those applied, interactively and under -V reveals via:
> 
>         $ grep -e set_progress_signal -e clear_progress_signal test-results/*out
> 
>     That nothing our tests cover hits the BUG conditions added here,
>     except the expected "BUG: start two concurrent progress bars" test
>     being added here.
> 
>     That isn't entirely true since we won't be getting 100% coverage
>     due to cascading failures from tests that expected no progress
>     output on stderr. To make sure I covered 100% I also tried making
>     the display() function in progress.c a NOOP on top of that (it's
>     the calls to start_progress_delay() and stop_progress()) that
>     matter.
> 
>     That doesn't hit the BUG() either. Some tests fail in that mode
>     due to a combination of the overzealous isatty(2) munging noted
>     above, and the tests that are testing that the progress output
>     itself is present (but for testing I'd made display() a NOOP).
> 
> Between those three points I think it's safe to go ahead with this
> change.

The above analysis only considers _our_ _current_ codebase.

However, even though this might be safe now, it doesn't mean that it
will remain safe in the future, as we might add new progress lines
that lack test coverage (though hopefully won't), and would hit that
BUG() at a user.

Furthermore, even though this might be safe in our codebase, it
doesn't mean that it is safe in some 20+k forks of Git that exist on
GitHub alone (I for one have a git command or two with in my fork
which output progress lines but, sadly, have zero test coverage).

But more importantly, even though it might be safe to do so, that
doesn't mean that it's a good idea to do so.  The commit message does
little to justify why it is conceptually a good idea to add this BUG()
to the progress code in a way that it can trigger outside of the test
suite.


> 1. 6f9d5f2fda1 (commit-graph: fix progress of reachable commits, 2020-07-09)
> 2. https://lore.kernel.org/git/20210620200303.2328957-3-szeder.dev@gmail.com
> 3. https://lore.kernel.org/git/patch-18.25-e21fc66623f-20210623T155626Z-avarab@gmail.com/
> 4. https://lore.kernel.org/git/cover-00.25-00000000000-20210623T155626Z-avarab@gmail.com/
> 5. b50c37aa44d (Merge branch 'ab/progress-users-adjust-counters' into
>    next, 2021-09-10)
> 6. https://lore.kernel.org/git/877dffg37n.fsf@evledraar.gmail.com/
> 
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  progress.c                  | 18 ++++++++++++++----
>  t/t0500-progress-display.sh | 11 +++++++++++
>  2 files changed, 25 insertions(+), 4 deletions(-)
> 
> diff --git a/progress.c b/progress.c
> index b9369e9a264..a31500f8b2b 100644
> --- a/progress.c
> +++ b/progress.c
> @@ -46,6 +46,7 @@ struct progress {
>  };
>  
>  static volatile sig_atomic_t progress_update;
> +static struct progress *global_progress;
>  
>  /*
>   * These are only intended for testing the progress output, i.e. exclusively
> @@ -219,11 +220,16 @@ void progress_test_force_update(void)
>  	progress_interval(SIGALRM);
>  }
>  
> -static void set_progress_signal(void)
> +static void set_progress_signal(struct progress *progress)
>  {
>  	struct sigaction sa;
>  	struct itimerval v;
>  
> +	if (global_progress)
> +		BUG("'%s' progress still active when trying to start '%s'",
> +		    global_progress->title, progress->title);
> +	global_progress = progress;

This function is called set_progress_signal(), so checking and setting
'global_progress' feels out of place here; it would be better to do
that in start_progress_delay().

> +
>  	if (progress_testing)
>  		return;
>  
> @@ -241,10 +247,14 @@ static void set_progress_signal(void)
>  	setitimer(ITIMER_REAL, &v, NULL);
>  }
>  
> -static void clear_progress_signal(void)
> +static void clear_progress_signal(struct progress *progress)
>  {
>  	struct itimerval v = {{0,},};
>  
> +	if (!global_progress)
> +		BUG("should have active global_progress when cleaning up");
> +	global_progress = NULL;

Likewise.

>  	if (progress_testing)
>  		return;
>  
> @@ -268,7 +278,7 @@ static struct progress *start_progress_delay(const char *title, uint64_t total,
>  	strbuf_init(&progress->counters_sb, 0);
>  	progress->title_len = utf8_strwidth(title);
>  	progress->split = 0;
> -	set_progress_signal();
> +	set_progress_signal(progress);
>  	trace2_region_enter("progress", title, the_repository);
>  	return progress;
>  }
> @@ -372,7 +382,7 @@ void stop_progress_msg(struct progress **p_progress, const char *msg)
>  		display(progress, progress->last_value, buf);
>  		free(buf);
>  	}
> -	clear_progress_signal();
> +	clear_progress_signal(progress);
>  	strbuf_release(&progress->counters_sb);
>  	if (progress->throughput)
>  		strbuf_release(&progress->throughput->display);
> diff --git a/t/t0500-progress-display.sh b/t/t0500-progress-display.sh
> index 59e9f226ea4..867fdace3f2 100755
> --- a/t/t0500-progress-display.sh
> +++ b/t/t0500-progress-display.sh
> @@ -298,6 +298,17 @@ test_expect_success 'cover up after throughput shortens a lot' '
>  	test_cmp expect out
>  '
>  
> +test_expect_success 'BUG: start two concurrent progress bars' '
> +	cat >in <<-\EOF &&
> +	start 0 one
> +	start 0 two
> +	EOF
> +
> +	test_must_fail test-tool progress \
> +		<in 2>stderr &&
> +	grep "^BUG: .*'\''one'\'' progress still active when trying to start '\''two'\''$" stderr
> +'
> +
>  test_expect_success 'progress generates traces' '
>  	cat >in <<-\EOF &&
>  	start 40
> -- 
> 2.33.1.1346.g48288c3c089
>
Junio C Hamano Oct. 25, 2021, 9:38 a.m. UTC | #3
SZEDER Gábor <szeder.dev@gmail.com> writes:

> I still very much dislike the idea of a BUG() in the progress code
> that can trigger outside of the test suite, because the progress line
> is only a UI gimmick and not a crucial part of any Git operation, and
> even though a progress line might be buggy, the underlying Git
> operation is not affected by it and would still finish successfully,
> as was the case with the dozen of so progress line bugs in the past.

I too recall that we have fixed numerous bugs in the past year in
the area, but weren't they kind of obvious ones _once_ they are
pointed out at you (e.g. progress never reaching to 100%)?  Yet the
developers have failed to catch them because their eyes would coast
over without paying attention to them, exactly because the progress
bar is merely a UI gimmick.

I haven't formed a firm opinion on this yet, but I think the idea
behind these BUG() is to help such problems be caught while they are
still in the lab.  You may not notice when your live progress bar
behaved a bit funny, but if you hit a BUG(), that would be squarely
in your face and you cannot ignore it.
Ævar Arnfjörð Bjarmason Oct. 25, 2021, 11:06 a.m. UTC | #4
On Mon, Oct 25 2021, SZEDER Gábor wrote:

> On Thu, Oct 14, 2021 at 12:28:26AM +0200, Ævar Arnfjörð Bjarmason wrote:
>> The progress.c code makes a hard assumption that only one progress bar
>> be active at a time (see [1] for a bug where this wasn't the
>> case). Add a BUG() that'll trigger if we ever regress on that promise
>> and have two progress bars active at the same time.
>
> I still very much dislike the idea of a BUG() in the progress code
> that can trigger outside of the test suite, because the progress line
> is only a UI gimmick and not a crucial part of any Git operation, and
> even though a progress line might be buggy, the underlying Git
> operation is not affected by it and would still finish successfully,
> as was the case with the dozen of so progress line bugs in the past.
>
>> There was an alternative test-only approach to doing the same
>> thing[2], but by doing this outside of a GIT_TEST_* mode we'll know
>> we've put a hard stop to this particular API misuse.
>> 
>> It will also establish scaffolding to address current fundamental
>> limitations in the progress output: The current output must be
>> "driven" by calls to the likes of display_progress().
>
> Please elaborate why that is a "fundamental limitation"; I don't see
> any drawback of the current approach.
>
>> Once we have a
>> global current progress object we'll be able to update that object via
>> SIGALRM.
>
> What are the supposed benefits of doing that?  I do see its drawbacks,
> considering that we have progress lines that are updated from multiple
> threads.

I've updated the commit messages in a re-roll I have incoming to
hopefully clear this up.

>> See [3] for early code to do that.
>> 
>> It's conceivable that this change will hit the BUG() condition in some
>> scenario that we don't currently have tests for, this would be very
>> bad. If that happened we'd die just because we couldn't emit some
>> pretty output.
>> 
>> See [4] for a discussion of why our test coverage is lacking; our
>> progress display is hidden behind isatty(2) checks in many cases, so
>> the test suite doesn't cover it unless individual tests are run in
>> "--verbose" mode, we might also have multi-threaded use of the API, so
>> two progress bars stopping and starting would only be visible due to a
>> race condition.
>> 
>> Despite that, I think that this change won't introduce such
>> regressions, because:
>> 
>>  1. I've read all the code using the progress API (and have modified a
>>     large part of it in some WIP code I have). Almost all of it is really
>>     simple, the parts that aren't[5] are complex in the display_progress() part,
>>     not in starting or stopping the progress bar.
>> 
>>  2. The entire test suite passes when instrumented with an ad-hoc
>>     Linux-specific mode (it uses gettid()) to die if progress bars are
>>     ever started or stopped on anything but the main thread[6].
>> 
>>     Extending that to die if display_progress() is called in a thread
>>     reveals that we have exactly two users of the progress bar under
>>     threaded conditions, "git index-pack" and "git pack-objects". Both
>>     uses are straightforward, and they don't start/stop the progress
>>     bar when threads are active.
>> 
>>  3. I've likewise done an ad-hoc test to force progress bars to be
>>     displayed with:
>> 
>>         perl -pi -e 's[isatty\(2\)][1]g' $(git grep -l -F 'isatty(2)')
>> 
>>     I.e. to replace all checks (not just for progress) of checking
>>     whether STDERR is connected to a TTY, and then monkeypatching
>>     is_foreground_fd() in progress.c to always "return 1". Running the
>>     tests with those applied, interactively and under -V reveals via:
>> 
>>         $ grep -e set_progress_signal -e clear_progress_signal test-results/*out
>> 
>>     That nothing our tests cover hits the BUG conditions added here,
>>     except the expected "BUG: start two concurrent progress bars" test
>>     being added here.
>> 
>>     That isn't entirely true since we won't be getting 100% coverage
>>     due to cascading failures from tests that expected no progress
>>     output on stderr. To make sure I covered 100% I also tried making
>>     the display() function in progress.c a NOOP on top of that (it's
>>     the calls to start_progress_delay() and stop_progress()) that
>>     matter.
>> 
>>     That doesn't hit the BUG() either. Some tests fail in that mode
>>     due to a combination of the overzealous isatty(2) munging noted
>>     above, and the tests that are testing that the progress output
>>     itself is present (but for testing I'd made display() a NOOP).
>> 
>> Between those three points I think it's safe to go ahead with this
>> change.
>
> The above analysis only considers _our_ _current_ codebase.
>
> However, even though this might be safe now, it doesn't mean that it
> will remain safe in the future, as we might add new progress lines
> that lack test coverage (though hopefully won't), and would hit that
> BUG() at a user.
>
> Furthermore, even though this might be safe in our codebase, it
> doesn't mean that it is safe in some 20+k forks of Git that exist on
> GitHub alone (I for one have a git command or two with in my fork
> which output progress lines but, sadly, have zero test coverage).
>
> But more importantly, even though it might be safe to do so, that
> doesn't mean that it's a good idea to do so.  The commit message does
> little to justify why it is conceptually a good idea to add this BUG()
> to the progress code in a way that it can trigger outside of the test
> suite.

I think partially I've addressed this above (i.e. in the incoming
re-roll's update commit message), but I might not for the question of
whether this is worth it overall. I'll update the commit message to
address this specific case, which was missing from it.

>> 1. 6f9d5f2fda1 (commit-graph: fix progress of reachable commits, 2020-07-09)
>> 2. https://lore.kernel.org/git/20210620200303.2328957-3-szeder.dev@gmail.com
>> 3. https://lore.kernel.org/git/patch-18.25-e21fc66623f-20210623T155626Z-avarab@gmail.com/
>> 4. https://lore.kernel.org/git/cover-00.25-00000000000-20210623T155626Z-avarab@gmail.com/
>> 5. b50c37aa44d (Merge branch 'ab/progress-users-adjust-counters' into
>>    next, 2021-09-10)
>> 6. https://lore.kernel.org/git/877dffg37n.fsf@evledraar.gmail.com/
>> 
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> ---
>>  progress.c                  | 18 ++++++++++++++----
>>  t/t0500-progress-display.sh | 11 +++++++++++
>>  2 files changed, 25 insertions(+), 4 deletions(-)
>> 
>> diff --git a/progress.c b/progress.c
>> index b9369e9a264..a31500f8b2b 100644
>> --- a/progress.c
>> +++ b/progress.c
>> @@ -46,6 +46,7 @@ struct progress {
>>  };
>>  
>>  static volatile sig_atomic_t progress_update;
>> +static struct progress *global_progress;
>>  
>>  /*
>>   * These are only intended for testing the progress output, i.e. exclusively
>> @@ -219,11 +220,16 @@ void progress_test_force_update(void)
>>  	progress_interval(SIGALRM);
>>  }
>>  
>> -static void set_progress_signal(void)
>> +static void set_progress_signal(struct progress *progress)
>>  {
>>  	struct sigaction sa;
>>  	struct itimerval v;
>>  
>> +	if (global_progress)
>> +		BUG("'%s' progress still active when trying to start '%s'",
>> +		    global_progress->title, progress->title);
>> +	global_progress = progress;
>
> This function is called set_progress_signal(), so checking and setting
> 'global_progress' feels out of place here; it would be better to do
> that in start_progress_delay().
>
>> +
>>  	if (progress_testing)
>>  		return;
>>  
>> @@ -241,10 +247,14 @@ static void set_progress_signal(void)
>>  	setitimer(ITIMER_REAL, &v, NULL);
>>  }
>>  
>> -static void clear_progress_signal(void)
>> +static void clear_progress_signal(struct progress *progress)
>>  {
>>  	struct itimerval v = {{0,},};
>>  
>> +	if (!global_progress)
>> +		BUG("should have active global_progress when cleaning up");
>> +	global_progress = NULL;
>
> Likewise.

*Nod* cleaned up this whole part, which became much simpler overall as a
result, thank you.
SZEDER Gábor Dec. 2, 2021, 11:14 p.m. UTC | #5
On Mon, Oct 25, 2021 at 02:38:04AM -0700, Junio C Hamano wrote:
> SZEDER Gábor <szeder.dev@gmail.com> writes:
> 
> > I still very much dislike the idea of a BUG() in the progress code
> > that can trigger outside of the test suite, because the progress line
> > is only a UI gimmick and not a crucial part of any Git operation, and
> > even though a progress line might be buggy, the underlying Git
> > operation is not affected by it and would still finish successfully,
> > as was the case with the dozen of so progress line bugs in the past.
> 
> I too recall that we have fixed numerous bugs in the past year in
> the area, but weren't they kind of obvious ones _once_ they are
> pointed out at you (e.g. progress never reaching to 100%)?  Yet the
> developers have failed to catch them because their eyes would coast
> over without paying attention to them, exactly because the progress
> bar is merely a UI gimmick.
> 
> I haven't formed a firm opinion on this yet, but I think the idea
> behind these BUG() is to help such problems be caught while they are
> still in the lab.  You may not notice when your live progress bar
> behaved a bit funny, but if you hit a BUG(), that would be squarely
> in your face and you cannot ignore it.

The "outside of the test suite" part is important.  Running the test
suite with a GIT_TEST_CHECK_PROGRESS knob is sufficient to catch these
issues as my inital two patch series have shown at the very beginning
of this thread before Ævar hijacked them.  Interested Git developers
can even enable it in their .profile if they wish; I did so with
GIT_TEST_SPLIT_INDEX for a while to expose some of my patches to more
real-world use.

However, I think it's conceptually a bad idea to abort with a BUG() an
otherwise successful Git operation by default, when that bug happens
to be in such an ancillary component as the progress display, and find
the justifications given in the commit message unconvincing.
Ævar Arnfjörð Bjarmason Dec. 3, 2021, 10:29 a.m. UTC | #6
On Fri, Dec 03 2021, SZEDER Gábor wrote:

> On Mon, Oct 25, 2021 at 02:38:04AM -0700, Junio C Hamano wrote:
>> SZEDER Gábor <szeder.dev@gmail.com> writes:
>> 
>> > I still very much dislike the idea of a BUG() in the progress code
>> > that can trigger outside of the test suite, because the progress line
>> > is only a UI gimmick and not a crucial part of any Git operation, and
>> > even though a progress line might be buggy, the underlying Git
>> > operation is not affected by it and would still finish successfully,
>> > as was the case with the dozen of so progress line bugs in the past.
>> 
>> I too recall that we have fixed numerous bugs in the past year in
>> the area, but weren't they kind of obvious ones _once_ they are
>> pointed out at you (e.g. progress never reaching to 100%)?  Yet the
>> developers have failed to catch them because their eyes would coast
>> over without paying attention to them, exactly because the progress
>> bar is merely a UI gimmick.
>> 
>> I haven't formed a firm opinion on this yet, but I think the idea
>> behind these BUG() is to help such problems be caught while they are
>> still in the lab.  You may not notice when your live progress bar
>> behaved a bit funny, but if you hit a BUG(), that would be squarely
>> in your face and you cannot ignore it.
>
> The "outside of the test suite" part is important.  Running the test
> suite with a GIT_TEST_CHECK_PROGRESS knob is sufficient to catch these
> issues as my inital two patch series have shown at the very beginning
> of this thread before Ævar hijacked them.

FWIW I didn't hijack and adjust your patches to be BUG() instead of a
test mode. We came up with more-or-less the same thing
independently around the same time.

You submitted yours on the 20th of June, but I hacked up mine on top af
on already-WIP series I had ~5 days earlier in response to another
discussion about progress.c[1]. You submitter yours first, and I then
replied to the effect of "here's a similar thing I had locally, RFC"[2].

Or do you mean the smaller "fix bogus counting" fixed that are now
landed[3]? I just wanted to push that forward since it seemed you
weren't per[4]. Sorry if I stepped on any toes there...

> Interested Git developers
> can even enable it in their .profile if they wish; I did so with
> GIT_TEST_SPLIT_INDEX for a while to expose some of my patches to more
> real-world use.
>
> However, I think it's conceptually a bad idea to abort with a BUG() an
> otherwise successful Git operation by default, when that bug happens
> to be in such an ancillary component as the progress display, and find
> the justifications given in the commit message unconvincing.

I think it's important to get to the root of what we really disagree
about here.

We hard die on all sorts of trivial things in BUG() already, as paging
through this shows:

    git grep -w -F 'BUG('

I.e. the general criteria for BUG() isn't whether we could
hypothetically recover from the scenario.

Which, if you adjust the code around many existing BUG()s to trigger the
"bad thing" (as it were..) is the case. I haven't done any exhaustive
checking, but I'd bet it's more common than not, or at least something
like 1/4 or 1/8 of them.

Rather it's a combination of:

 A. This should absolutely not happen
 B. We are sure (or are sure as we can be) that it doesn't
 C. We think if it does, we'll know/really want to know via tests/early integration testing.
 D. We mark it with a BUG() because we're as confident as we reasonably can be in A and B

I took your earlier comments to mean that you agreed on "A", but you
weren't so sure about "B". I agree that without confidence about "B"
that having "D" would be overzealous.

But here with your mention of "conceptually a bad idea to[...]" I'm not
so sure, i.e. it seems like you're saying that you're categorically
opposed to such use of BUG(). I think that's a fair stance to take if
that's what you're going for, but per the above I also think it's true
to say that a lot of our existing use of that API doesn't match that
worldview. I.e. it's not an UNRECOVARABLE_BUG(), but
SHOULD_NOT_ESCAPE_DEVELOPMENT_BUG().

I think that in this case we should be confident that we've got "B", as
explained in detail in the upthread patch[5].

I would be exceedingly paranoid about more exhaustive checks, e.g. I
don't think we've got enough coverage to have BUG() checks for some of
the things you added test-only asserts (which would be good) for in your
initial series.

I.e. asserting whether we hit 99% and not 100%, whether
display_progress() "misses" updates, or if we miss a call to
stop_progress(), or whatever else one might assert.

I've been running my personal git with many of those sorts of pedantic
assertions for a while now, but I'm still not confident those changes
are OK to submit for inclusion. I very much share your concerns in that
area.

But I think (and [5] argues) that it's *much* easier to be confident
about the narrow BUG() assertion being added here.

I.e. it's trivial to discover that almost all callers of progress.c are
never going to start more than one progress bar ever, so we can discard
those out of hand. For the rest they're rather easily auditable.

But clearly you disagree, and I thinks it helps neither of us to just
repeat our points, which is not what I'm trying to do here, but to get
to the following:

 - Can you cite specifically what about [5] you think wouldn't catch any
   cases where we couldn't be certain about the BUG(). I.e. it outlines
   the sort of testing I did, which involved forcing all hidden progress bars
   to ignore their existing isatty() checks, adding more pedantic asserts
   to narrow down potentially affected callers etc.

   Do you think that testing could have missed something? How & why?

 - Related to that, are there cases where you would agree that we've got "B",
   as opposed to others where we don't?

   As I've argued I do think these patches are ready as-is, but one alternative
   way forward with them would be to add the BUG() to everything, but whitelist
   those current callers that have >1 progress bar. Then as a (slow) follow-up
   proceed to un-whitelist those one at a time.

I'm not familiar with the GIT_TEST_SPLIT_INDEX changes you're
mentioning, but I'd think that those cases would have been more complex
(the code flow around index-related stuff) than the progress API users.

1. https://lore.kernel.org/git/87o8c8z105.fsf@evledraar.gmail.com/
2. https://lore.kernel.org/git/cover-00.25-00000000000-20210623T155626Z-avarab@gmail.com/
3. https://lore.kernel.org/git/cover-v4-0.2-00000000000-20210909T010722Z-avarab@gmail.com/
4. https://lore.kernel.org/git/cover-0.3-0000000000-20210722T121801Z-avarab@gmail.com/
5. https://lore.kernel.org/git/patch-v6-8.8-bff919994b5-20211102T122507Z-avarab@gmail.com/
diff mbox series

Patch

diff --git a/progress.c b/progress.c
index b9369e9a264..a31500f8b2b 100644
--- a/progress.c
+++ b/progress.c
@@ -46,6 +46,7 @@  struct progress {
 };
 
 static volatile sig_atomic_t progress_update;
+static struct progress *global_progress;
 
 /*
  * These are only intended for testing the progress output, i.e. exclusively
@@ -219,11 +220,16 @@  void progress_test_force_update(void)
 	progress_interval(SIGALRM);
 }
 
-static void set_progress_signal(void)
+static void set_progress_signal(struct progress *progress)
 {
 	struct sigaction sa;
 	struct itimerval v;
 
+	if (global_progress)
+		BUG("'%s' progress still active when trying to start '%s'",
+		    global_progress->title, progress->title);
+	global_progress = progress;
+
 	if (progress_testing)
 		return;
 
@@ -241,10 +247,14 @@  static void set_progress_signal(void)
 	setitimer(ITIMER_REAL, &v, NULL);
 }
 
-static void clear_progress_signal(void)
+static void clear_progress_signal(struct progress *progress)
 {
 	struct itimerval v = {{0,},};
 
+	if (!global_progress)
+		BUG("should have active global_progress when cleaning up");
+	global_progress = NULL;
+
 	if (progress_testing)
 		return;
 
@@ -268,7 +278,7 @@  static struct progress *start_progress_delay(const char *title, uint64_t total,
 	strbuf_init(&progress->counters_sb, 0);
 	progress->title_len = utf8_strwidth(title);
 	progress->split = 0;
-	set_progress_signal();
+	set_progress_signal(progress);
 	trace2_region_enter("progress", title, the_repository);
 	return progress;
 }
@@ -372,7 +382,7 @@  void stop_progress_msg(struct progress **p_progress, const char *msg)
 		display(progress, progress->last_value, buf);
 		free(buf);
 	}
-	clear_progress_signal();
+	clear_progress_signal(progress);
 	strbuf_release(&progress->counters_sb);
 	if (progress->throughput)
 		strbuf_release(&progress->throughput->display);
diff --git a/t/t0500-progress-display.sh b/t/t0500-progress-display.sh
index 59e9f226ea4..867fdace3f2 100755
--- a/t/t0500-progress-display.sh
+++ b/t/t0500-progress-display.sh
@@ -298,6 +298,17 @@  test_expect_success 'cover up after throughput shortens a lot' '
 	test_cmp expect out
 '
 
+test_expect_success 'BUG: start two concurrent progress bars' '
+	cat >in <<-\EOF &&
+	start 0 one
+	start 0 two
+	EOF
+
+	test_must_fail test-tool progress \
+		<in 2>stderr &&
+	grep "^BUG: .*'\''one'\'' progress still active when trying to start '\''two'\''$" stderr
+'
+
 test_expect_success 'progress generates traces' '
 	cat >in <<-\EOF &&
 	start 40