diff mbox series

[8/8] progress.c: add & assert a "global_progress" variable

Message ID patch-8.8-1a14176bee-20210722T125012Z-avarab@gmail.com (mailing list archive)
State New, archived
Headers show
Series progress: assert "global_progress" + test fixes / cleanup | expand

Commit Message

Ævar Arnfjörð Bjarmason July 22, 2021, 12:55 p.m. UTC
The progress.c code makes a hard assumption that only one progress bar
be active at a time (see [1] for a bug where this wasn't the case),
but nothing has asserted that that's the case. Let's add a BUG()
that'll trigger if two progress bars are active at the same time.

There's an alternate test-only approach to doing the same thing[2],
but by doing this for all progress bars we'll have a canary to check
if we have any unexpected interaction between the "sig_atomic_t
progress_update" variable and this global struct.

I am then planning on using this scaffolding in the future to fix a
limitation in the progress output, namely the current limitation of
the progress.c bar code that any update must pro-actively go through
the likes of display_progress().

If we e.g. hang forever before the first display_progress(), or in the
middle of a loop that would call display_progress() the user will only
see either no output, or output frozen at the last display_progress()
that would have done an update (e.g. in cases where progress_update
was "1" due to an earlier signal).

This change does not fix that, but sets up the structure for solving
that and other related problems by juggling this "global_progress"
struct. Later changes will make more use of the "global_progress" than
only using it for these assertions.

1. 6f9d5f2fda1 (commit-graph: fix progress of reachable commits, 2020-07-09)
2. https://lore.kernel.org/git/20210620200303.2328957-3-szeder.dev@gmail.com

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 progress.c                  | 17 +++++++++++++----
 t/t0500-progress-display.sh | 11 +++++++++++
 2 files changed, 24 insertions(+), 4 deletions(-)

Comments

Ævar Arnfjörð Bjarmason Sept. 16, 2021, 9:34 p.m. UTC | #1
I've adjusted the In-reply-to header here.

This is really in reply to <20210916183137.GD76263@szeder.dev>, but
that's a reply to a previous & longer RFC-ish iteration of the
series. https://lore.kernel.org/git/cover-0.8-0000000000-20210722T125012Z-avarab@gmail.com/
is what's currently queued as ab/only-single-progress-at-once in "seen".

Let's continue the discussion relevant to the currently proposed patches
in this thread.

On Thu, Sep 16 2021, SZEDER Gábor wrote:

> On Wed, Jun 23, 2021 at 07:48:12PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> The progress.c code makes a hard assumption that only one progress bar
>> be active at a time (see [1] for a bug where this wasn't the case),
>> but nothing has asserted that that's the case. Let's add a BUG()
>> that'll trigger if two progress bars are active at the same time.
>
> I very much dislike the idea of any BUG() in the progress code that
> can trigger outside of the test suite.

First regarding the state of this series. I'd understood your:

    Please don't advance this to next yet.  I've found some issues with
    it, but not the time to raise them.

In https://lore.kernel.org/git/20210901050406.GB76263@szeder.dev/ to
mean (, by perhaps overly reading between the lines out of paranoia,)
that you'd found some case where we'd hit the BUG() here.

Am I correct that you haven't, but are concerned that we've left some
case undiscovered where we might?

I'm wondering given your replies there if you thought that the series to
be merged down was that 25 patch one. I agree (e.g. because of your
https://lore.kernel.org/git/20210916183711.GE76263@szeder.dev/) that
merging that one down wouldn't be a good idea in its current state.

Of course you may still have valid concerns etc., just trying to clarify
if there's some specific known outstanding issue or not with the current
8-patch series in "seen".

> As the number of progress-related fixes clearly show, we often misuse
> the progress API, and, arguably, a bug is a bug is a bug, so strictly
> speaking a BUG() is not wrong here.
>
> However, the progress line is merely a UI gimmick, not a crucial part
> of Git, and none of those progress bugs affected the correctness of
> the operation itself.  Worse, calling BUG() during some operations
> (e.g. 'git commit-graph write', the worst offender when it comes to
> progress bugs) can leave a lockfile behind, resulting in scary errors
> and requiring manual cleanup in the .git directory, which is a much
> worse UX than showing some bogus values or out of order progress
> lines.

Yes, I agree that actually hitting this BUG() would absolutely suck, and
that we shouldn't consider this patch if we weren't certain, or near
enough, that we wouldn't hit it.

As covered in the cover letter of the earlier series I sent (at
https://lore.kernel.org/git/cover-00.25-00000000000-20210623T155626Z-avarab@gmail.com/)
I share your concerns that it's hard to uncover if we've reached
sufficient coverage to be certain that we should add certain assertions,
i.e. some of the ones your initial series added around the actual
progress counting that we do.

In this case though, I do think we can safely add this. Maybe you have
run into the BUG() and I'm about to get some egg on my face, but here's
why I think we won't hit it.

This is a *much* narrower case than the general thread quagmire around
display_progress() etc., since this only covers the start/stop progress
calls. All of the multi-threaded code we have does the equivalent of:

    p = start_progress();
    start_threads();
    /* do stuff in threads, including with "p" */
    stop_threads();
    stop_progress(p);

Do you agree that if this is the pattern everywhere that this patch
would be safe in its current form?

To assert that that's the case (I'd read the code before) I instrumented
the tests to BUG() out if we ever start() or stop() where getpid() !=
gettid(), which on Linux means you're inside a pthread. The diff-on-top
is at the end of this E-Mail.

With that and running the tests with:

    GIT_TEST_BUG_START=1 \
    GIT_TEST_BUG_STOP=1

We pass all tests, i.e. there's no current callers that call
start_progress() or start_progress() and do so in anything but the main
program thread.

Well, there could be, per the concerns I had in the CL linked above,
i.e. sometimes that start_progress() is guarded by a preceding isatty()
or whatever, but having looked at those / grepped the union of pthread +
progress in the source I think there's no such cases. That's *much*
easier to eyeball than "does all this control flow around
display_progress() make sense?".

Getting back on topic, we pass all tests with that, but we'll fail with:

    GIT_TEST_BUG_DISPLAY=1 \
    GIT_TEST_BUG_DISPLAY_HARDER=1

Why the second variable? Because I marked up the two callers that call
display_progress() within threads, it's just index-pack and
pack-objects. If you don't set GIT_TEST_BUG_DISPLAY_HARDER=1 we'll
whitelist those two, and all tests pass.

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 6cc48902170..05c82dc6e6d 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1229,6 +1229,7 @@ static void resolve_deltas(void)
 	QSORT(ofs_deltas, nr_ofs_deltas, compare_ofs_delta_entry);
 	QSORT(ref_deltas, nr_ref_deltas, compare_ref_delta_entry);
 
+	no_progress_bug();
 	if (verbose || show_resolving_progress)
 		progress = start_progress(_("Resolving deltas"),
 					  nr_ref_deltas + nr_ofs_deltas);
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index ec8503563a6..97ad321f67c 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3072,6 +3072,7 @@ static void prepare_pack(int window, int depth)
 	if (nr_deltas && n > 1) {
 		unsigned nr_done = 0;
 
+		no_progress_bug();
 		if (progress)
 			progress_state = start_progress(_("Compressing objects"),
 							nr_deltas);
diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh
index f3aba5d6cbb..efa1e5e5bdf 100755
--- a/ci/run-build-and-tests.sh
+++ b/ci/run-build-and-tests.sh
@@ -5,6 +5,9 @@
 
 . ${0%/*}/lib.sh
 
+export GIT_TEST_BUG_START=true
+export GIT_TEST_BUG_STOP=true
+
 case "$CI_OS_NAME" in
 windows*) cmd //c mklink //j t\\.prove "$(cygpath -aw "$cache_dir/.prove")";;
 *) ln -s "$cache_dir/.prove" t/.prove;;
diff --git a/progress.c b/progress.c
index 14a023f4b43..bdebf100ad8 100644
--- a/progress.c
+++ b/progress.c
@@ -16,6 +16,7 @@
 #include "trace.h"
 #include "utf8.h"
 #include "config.h"
+#include <unistd.h>
 
 #define TP_IDX_MAX      8
 
@@ -202,8 +203,18 @@ void display_throughput(struct progress *progress, uint64_t total)
 		display(progress, progress->last_value, NULL);
 }
 
+static int no_bug_please;
+void no_progress_bug(void)
+{
+	no_bug_please = 1;
+}
+
 void display_progress(struct progress *progress, uint64_t n)
 {
+	if (getpid() != gettid() && getenv("GIT_TEST_BUG_DISPLAY") &&
+	    (getenv("GIT_TEST_BUG_DISPLAY_HARDER") || !no_bug_please))
+		BUG("display: pid = %d, tid = %d: %s\n", getpid(), gettid(),
+		    progress ? progress->title : "N/A");
 	if (progress)
 		display(progress, n, NULL);
 }
@@ -281,6 +292,11 @@ static struct progress *start_progress_delay(const char *title, uint64_t total,
 	progress->split = 0;
 	set_progress_signal(progress);
 	trace2_region_enter("progress", title, the_repository);
+
+	if (getpid() != gettid() && getenv("GIT_TEST_BUG_START"))
+		BUG("start: pid = %d, tid = %d: %s\n", getpid(), gettid(),
+		    title ? title : "N/A");
+
 	return progress;
 }
 
@@ -334,6 +350,11 @@ static void finish_if_sparse(struct progress *progress)
 
 void stop_progress(struct progress **p_progress)
 {
+	no_bug_please = 0;
+	if (getpid() != gettid() && getenv("GIT_TEST_BUG_STOP"))
+		BUG("stop: pid = %d, tid = %d: %s\n", getpid(), gettid(),
+		    ((p_progress && *p_progress) ? (*p_progress)->title : "N/A"));
+
 	if (!p_progress)
 		BUG("don't provide NULL to stop_progress");
 
diff --git a/progress.h b/progress.h
index f1913acf73f..2ebb1da2666 100644
--- a/progress.h
+++ b/progress.h
@@ -20,5 +20,6 @@ struct progress *start_delayed_sparse_progress(const char *title,
 					       uint64_t total);
 void stop_progress(struct progress **progress);
 void stop_progress_msg(struct progress **progress, const char *msg);
+void no_progress_bug(void);
 
 #endif
diff mbox series

Patch

diff --git a/progress.c b/progress.c
index 1ab7d19deb..14a023f4b4 100644
--- a/progress.c
+++ b/progress.c
@@ -46,6 +46,7 @@  struct progress {
 };
 
 static volatile sig_atomic_t progress_update;
+static struct progress *global_progress;
 
 /*
  * These are only intended for testing the progress output, i.e. exclusively
@@ -221,11 +222,15 @@  void progress_test_force_update(void)
 	progress_interval(SIGALRM);
 }
 
-static void set_progress_signal(void)
+static void set_progress_signal(struct progress *progress)
 {
 	struct sigaction sa;
 	struct itimerval v;
 
+	if (global_progress)
+		BUG("should have no global_progress in set_progress_signal()");
+	global_progress = progress;
+
 	if (progress_testing)
 		return;
 
@@ -243,10 +248,14 @@  static void set_progress_signal(void)
 	setitimer(ITIMER_REAL, &v, NULL);
 }
 
-static void clear_progress_signal(void)
+static void clear_progress_signal(struct progress *progress)
 {
 	struct itimerval v = {{0,},};
 
+	if (!global_progress)
+		BUG("should have a global_progress in clear_progress_signal()");
+	global_progress = NULL;
+
 	if (progress_testing)
 		return;
 
@@ -270,7 +279,7 @@  static struct progress *start_progress_delay(const char *title, uint64_t total,
 	strbuf_init(&progress->counters_sb, 0);
 	progress->title_len = utf8_strwidth(title);
 	progress->split = 0;
-	set_progress_signal();
+	set_progress_signal(progress);
 	trace2_region_enter("progress", title, the_repository);
 	return progress;
 }
@@ -374,7 +383,7 @@  void stop_progress_msg(struct progress **p_progress, const char *msg)
 		display(progress, progress->last_value, buf);
 		free(buf);
 	}
-	clear_progress_signal();
+	clear_progress_signal(progress);
 	strbuf_release(&progress->counters_sb);
 	if (progress->throughput)
 		strbuf_release(&progress->throughput->display);
diff --git a/t/t0500-progress-display.sh b/t/t0500-progress-display.sh
index ffa819ca1d..124d33c96b 100755
--- a/t/t0500-progress-display.sh
+++ b/t/t0500-progress-display.sh
@@ -296,6 +296,17 @@  test_expect_success 'cover up after throughput shortens a lot' '
 	test_cmp expect out
 '
 
+test_expect_success 'BUG: start two concurrent progress bars' '
+	cat >in <<-\EOF &&
+	start 0 one
+	start 0 two
+	EOF
+
+	test_must_fail test-tool progress \
+		<in 2>stderr &&
+	grep -E "^BUG: .*: should have no global_progress in set_progress_signal\(\)$" stderr
+'
+
 test_expect_success 'progress generates traces' '
 	cat >in <<-\EOF &&
 	start 40