diff mbox series

[4/4] usage.c: add a non-fatal bug() function to go with BUG()

Message ID patch-4.5-515d146cac8-20210328T022343Z-avarab@gmail.com (mailing list archive)
State New, archived
Headers show
Series usage.c: add a non-fatal bug() + misc doc fixes | expand

Commit Message

Ævar Arnfjörð Bjarmason March 28, 2021, 2:26 a.m. UTC
Add a bug() function that works like error() except the message is
prefixed with "bug:" instead of "error:".

The reason this is needed is for e.g. the fsck code. If we encounter
what we'd consider a BUG() in the middle of fsck traversal we'd still
like to try as hard as possible to go past that object and complete
the fsck, instead of hard dying. A follow-up commit will introduce
such a use in object-file.c.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 .../technical/api-error-handling.txt          |  8 ++++-
 Documentation/technical/api-trace2.txt        |  4 +--
 git-compat-util.h                             |  3 ++
 run-command.c                                 | 11 +++++++
 t/helper/test-trace2.c                        | 14 +++++++--
 t/t0210-trace2-normal.sh                      | 19 ++++++++++++
 usage.c                                       | 29 +++++++++++++++++++
 7 files changed, 83 insertions(+), 5 deletions(-)

Comments

Junio C Hamano March 28, 2021, 6:12 a.m. UTC | #1
Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Add a bug() function that works like error() except the message is
> prefixed with "bug:" instead of "error:".
>
> The reason this is needed is for e.g. the fsck code. If we encounter
> what we'd consider a BUG() in the middle of fsck traversal we'd still
> like to try as hard as possible to go past that object and complete
> the fsck, instead of hard dying. A follow-up commit will introduce
> such a use in object-file.c.

Reading the description above, i.e. "to go past that object", the
assumed use case seems to be to deal with a data error, not a
program bug (which is where we use BUG()---e.g. one helper function
in the fsck code detected that the caller wasn't careful enough to
vet the data it has and called it with incoherent data).  If we find
a tree entry whose mode bits implies that the object recorded in the
entry ought to be a blob, and later find out that the object turns
out to be a tree, that is a corrupt repository and the code that
detected is not buggy (and we shouldn't use BUG(), of course).

So, ... I am skeptical.  If the code is prepared to handle breakage,
we would not want to die, but then I am not sure why it has to be
different from error().
Jeff King March 28, 2021, 7:17 a.m. UTC | #2
On Sat, Mar 27, 2021 at 11:12:40PM -0700, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
> 
> > Add a bug() function that works like error() except the message is
> > prefixed with "bug:" instead of "error:".
> >
> > The reason this is needed is for e.g. the fsck code. If we encounter
> > what we'd consider a BUG() in the middle of fsck traversal we'd still
> > like to try as hard as possible to go past that object and complete
> > the fsck, instead of hard dying. A follow-up commit will introduce
> > such a use in object-file.c.
> 
> Reading the description above, i.e. "to go past that object", the
> assumed use case seems to be to deal with a data error, not a
> program bug (which is where we use BUG()---e.g. one helper function
> in the fsck code detected that the caller wasn't careful enough to
> vet the data it has and called it with incoherent data).  If we find
> a tree entry whose mode bits implies that the object recorded in the
> entry ought to be a blob, and later find out that the object turns
> out to be a tree, that is a corrupt repository and the code that
> detected is not buggy (and we shouldn't use BUG(), of course).
> 
> So, ... I am skeptical.  If the code is prepared to handle breakage,
> we would not want to die, but then I am not sure why it has to be
> different from error().

Yeah, this seems like it is missing the point of BUG() completely.  I
took a peek at patch 5/5 of the follow-on, which uses bug(). It looks
like it should really be an error() return or similar. The root cause
would be open_istream() on a loose object failing (which might be
corruption, or might even be a transient OS error!).

-Peff
Ævar Arnfjörð Bjarmason March 29, 2021, 1:25 p.m. UTC | #3
On Sun, Mar 28 2021, Jeff King wrote:

> On Sat, Mar 27, 2021 at 11:12:40PM -0700, Junio C Hamano wrote:
>
>> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>> 
>> > Add a bug() function that works like error() except the message is
>> > prefixed with "bug:" instead of "error:".
>> >
>> > The reason this is needed is for e.g. the fsck code. If we encounter
>> > what we'd consider a BUG() in the middle of fsck traversal we'd still
>> > like to try as hard as possible to go past that object and complete
>> > the fsck, instead of hard dying. A follow-up commit will introduce
>> > such a use in object-file.c.
>> 
>> Reading the description above, i.e. "to go past that object", the
>> assumed use case seems to be to deal with a data error, not a
>> program bug (which is where we use BUG()---e.g. one helper function
>> in the fsck code detected that the caller wasn't careful enough to
>> vet the data it has and called it with incoherent data).  If we find
>> a tree entry whose mode bits implies that the object recorded in the
>> entry ought to be a blob, and later find out that the object turns
>> out to be a tree, that is a corrupt repository and the code that
>> detected is not buggy (and we shouldn't use BUG(), of course).
>> 
>> So, ... I am skeptical.  If the code is prepared to handle breakage,
>> we would not want to die, but then I am not sure why it has to be
>> different from error().
>
> Yeah, this seems like it is missing the point of BUG() completely.  I
> took a peek at patch 5/5 of the follow-on, which uses bug(). It looks
> like it should really be an error() return or similar. The root cause
> would be open_istream() on a loose object failing (which might be
> corruption, or might even be a transient OS error!).

I don't feel strongly about this bug() thing, I'll drop it if you two
don't like it.

But that's not why I added it, yes you can now carefully read the code
and reason that this code is unreachable now, as I think it is.

But it may not stay that way, refactoring how we handle I/O errors
etc. further down the stack is the sort of thing that if this bug()
wasn't there would cause us to otherwise silently lose the
error. I.e. does check_object_signature() always promise to return
non-zero *only* if the signature isn't OK?

So maybe we are happy to just make that promise, in which case yes, this
should/could be an error() in this case.

But isn't this also useful for multi-threaded code? E.g. let's say fsck
learns to map-reduce its fsck-ing of objects across threads. One of them
encounters a BUG(). Do we want to hard kill the whole thing or try to
limp ahead and report partial results from the other thread(s)?

We have than now with pack-objects/grep, but I'm struggling to find a
use-case for a partial grep result if e.g. PCRE fails with a BUG(...)
...
Jeff King March 31, 2021, 11:06 a.m. UTC | #4
On Mon, Mar 29, 2021 at 03:25:09PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > Yeah, this seems like it is missing the point of BUG() completely.  I
> > took a peek at patch 5/5 of the follow-on, which uses bug(). It looks
> > like it should really be an error() return or similar. The root cause
> > would be open_istream() on a loose object failing (which might be
> > corruption, or might even be a transient OS error!).
> 
> I don't feel strongly about this bug() thing, I'll drop it if you two
> don't like it.
> 
> But that's not why I added it, yes you can now carefully read the code
> and reason that this code is unreachable now, as I think it is.
> 
> But it may not stay that way, refactoring how we handle I/O errors
> etc. further down the stack is the sort of thing that if this bug()
> wasn't there would cause us to otherwise silently lose the
> error. I.e. does check_object_signature() always promise to return
> non-zero *only* if the signature isn't OK?
> 
> So maybe we are happy to just make that promise, in which case yes, this
> should/could be an error() in this case.

I didn't dig into what check_object_signature() promises, but I don't
think it matters for my argument. If the case you are looking at can be
triggered by bad on-disk data, transient OS errors, etc, then it should
be an error() or a die(), or whatever is appropriate for the code. If it
is meant to be an invariant of the code that it should never trigger,
then it should be a BUG(), so that we loudly inform people that the
code's assumption has been violated.

But I do not see any point in a bug() that does not abort(). The point
of BUG() is that nobody is supposed to see it, and we should be as loud
as possible if we do.

And if there is a call site that is in doubt about what it may be fed,
then it should just be an error() or die().

> But isn't this also useful for multi-threaded code? E.g. let's say fsck
> learns to map-reduce its fsck-ing of objects across threads. One of them
> encounters a BUG(). Do we want to hard kill the whole thing or try to
> limp ahead and report partial results from the other thread(s)?

Yes, we want to hard kill. The point of BUG() is that it is not supposed
to happen, and there is no point in limping further.

-Peff
diff mbox series

Patch

diff --git a/Documentation/technical/api-error-handling.txt b/Documentation/technical/api-error-handling.txt
index 8be4f4d0d6a..9d6ac6f6649 100644
--- a/Documentation/technical/api-error-handling.txt
+++ b/Documentation/technical/api-error-handling.txt
@@ -1,7 +1,7 @@ 
 Error reporting in git
 ======================
 
-`BUG`, `die`, `usage`, `error`, and `warning` report errors of
+`BUG`, `bug`, `die`, `usage`, `error`, and `warning` report errors of
 various kinds.
 
 - `BUG` is for failed internal assertions that should never happen,
@@ -18,6 +18,12 @@  various kinds.
   to the user and returns -1 for convenience in signaling the error
   to the caller.
 
+- `bug` (lower-case, not `BUG`) is supposed to be used like `BUG` but
+  has the same non-fatal semantics as `error`. It's meant to signal an
+  internal bug in a library whose caller might still want to attempt
+  some amount of graceful recovery, or to append other error output of
+  their own.
+
 - `warning` is for reporting situations that probably should not
   occur but which the user (and Git) can continue to work around
   without running into too many problems.  Like `error`, it
diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index 3f52f981a2d..cafe373f405 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -465,8 +465,8 @@  completed.)
 ------------
 
 `"error"`::
-	This event is emitted when one of the `BUG()`, `error()`, `die()`,
-	`warning()`, or `usage()` functions are called.
+	This event is emitted when one of the `BUG()`, `bug()`, `error()`,
+	`die()`, `warning()`, or `usage()` functions are called.
 +
 ------------
 {
diff --git a/git-compat-util.h b/git-compat-util.h
index 9ddf9d7044b..13c1dcf9dcc 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -463,6 +463,7 @@  NORETURN void usage(const char *err);
 NORETURN void usagef(const char *err, ...) __attribute__((format (printf, 1, 2)));
 NORETURN void die(const char *err, ...) __attribute__((format (printf, 1, 2)));
 NORETURN void die_errno(const char *err, ...) __attribute__((format (printf, 1, 2)));
+int bug(const char *err, ...) __attribute__((format (printf, 1, 2)));
 int error(const char *err, ...) __attribute__((format (printf, 1, 2)));
 int error_errno(const char *err, ...) __attribute__((format (printf, 1, 2)));
 void warning(const char *err, ...) __attribute__((format (printf, 1, 2)));
@@ -497,6 +498,8 @@  static inline int const_error(void)
 typedef void (*report_fn)(const char *, va_list params);
 
 void set_die_routine(NORETURN_PTR report_fn routine);
+void set_bug_routine(report_fn routine);
+report_fn get_bug_routine(void);
 void set_error_routine(report_fn routine);
 report_fn get_error_routine(void);
 void set_warn_routine(report_fn routine);
diff --git a/run-command.c b/run-command.c
index be6bc128cd9..8b818b063ff 100644
--- a/run-command.c
+++ b/run-command.c
@@ -348,6 +348,12 @@  static void fake_fatal(const char *err, va_list params)
 	vreportf("fatal: ", err, params);
 }
 
+static void child_bug_fn(const char *err, va_list params)
+{
+	const char msg[] = "bug() should not be called in child\n";
+	xwrite(2, msg, sizeof(msg) - 1);
+}
+
 static void child_error_fn(const char *err, va_list params)
 {
 	const char msg[] = "error() should not be called in child\n";
@@ -371,9 +377,12 @@  static void NORETURN child_die_fn(const char *err, va_list params)
 static void child_err_spew(struct child_process *cmd, struct child_err *cerr)
 {
 	static void (*old_errfn)(const char *err, va_list params);
+	static void (*old_bugfn)(const char *err, va_list params);
 
 	old_errfn = get_error_routine();
 	set_error_routine(fake_fatal);
+	old_bugfn = get_bug_routine();
+	set_bug_routine(fake_fatal);
 	errno = cerr->syserr;
 
 	switch (cerr->err) {
@@ -399,6 +408,7 @@  static void child_err_spew(struct child_process *cmd, struct child_err *cerr)
 		error_errno("cannot exec '%s'", cmd->argv[0]);
 		break;
 	}
+	set_bug_routine(old_bugfn);
 	set_error_routine(old_errfn);
 }
 
@@ -789,6 +799,7 @@  int start_command(struct child_process *cmd)
 		 * called, they can take stdio locks and malloc.
 		 */
 		set_die_routine(child_die_fn);
+		set_error_routine(child_bug_fn);
 		set_error_routine(child_error_fn);
 		set_warn_routine(child_warn_fn);
 
diff --git a/t/helper/test-trace2.c b/t/helper/test-trace2.c
index f93633f895a..6248427e4bf 100644
--- a/t/helper/test-trace2.c
+++ b/t/helper/test-trace2.c
@@ -198,7 +198,7 @@  static int ut_006data(int argc, const char **argv)
 	return 0;
 }
 
-static int ut_007bug(int argc, const char **argv)
+static int ut_007BUG(int argc, const char **argv)
 {
 	/*
 	 * Exercise BUG() to ensure that the message is printed to trace2.
@@ -206,6 +206,15 @@  static int ut_007bug(int argc, const char **argv)
 	BUG("the bug message");
 }
 
+static int ut_008bug(int argc, const char **argv)
+{
+	/*
+	 * Exercise BUG() to ensure that the message is printed to trace2.
+	 */
+	bug("the bug message");
+	return 0;
+}
+
 /*
  * Usage:
  *     test-tool trace2 <ut_name_1> <ut_usage_1>
@@ -222,7 +231,8 @@  static struct unit_test ut_table[] = {
 	{ ut_004child,    "004child",  "[<child_command_line>]" },
 	{ ut_005exec,     "005exec",   "<git_command_args>" },
 	{ ut_006data,     "006data",   "[<category> <key> <value>]+" },
-	{ ut_007bug,      "007bug",    "" },
+	{ ut_007BUG,      "007bug",    "" },
+	{ ut_008bug,      "008bug",    "" },
 };
 /* clang-format on */
 
diff --git a/t/t0210-trace2-normal.sh b/t/t0210-trace2-normal.sh
index 0cf3a63b75b..9c866af971f 100755
--- a/t/t0210-trace2-normal.sh
+++ b/t/t0210-trace2-normal.sh
@@ -166,6 +166,25 @@  test_expect_success 'BUG messages are written to trace2' '
 	test_cmp expect actual
 '
 
+# Verb 008bug
+#
+# Check that BUG writes to trace2
+
+test_expect_success 'bug messages are written to trace2' '
+	test_when_finished "rm trace.normal actual expect" &&
+	GIT_TRACE2="$(pwd)/trace.normal" test-tool trace2 008bug &&
+	perl "$TEST_DIRECTORY/t0210/scrub_normal.perl" <trace.normal >actual &&
+	cat >expect <<-EOF &&
+		version $V
+		start _EXE_ trace2 008bug
+		cmd_name trace2 (trace2)
+		error the bug message
+		exit elapsed:_TIME_ code:0
+		atexit elapsed:_TIME_ code:0
+	EOF
+	test_cmp expect actual
+'
+
 sane_unset GIT_TRACE2_BRIEF
 
 # Now test without environment variables and get all Trace2 settings
diff --git a/usage.c b/usage.c
index c7d233b0de9..34bd3abf048 100644
--- a/usage.c
+++ b/usage.c
@@ -69,6 +69,13 @@  static NORETURN void die_builtin(const char *err, va_list params)
 	exit(128);
 }
 
+static void bug_builtin(const char *err, va_list params)
+{
+	trace2_cmd_error_va(err, params);
+
+	vreportf("bug: ", err, params);
+}
+
 static void error_builtin(const char *err, va_list params)
 {
 	trace2_cmd_error_va(err, params);
@@ -109,6 +116,7 @@  static int die_is_recursing_builtin(void)
  * (ugh), so keep things static. */
 static NORETURN_PTR report_fn usage_routine = usage_builtin;
 static NORETURN_PTR report_fn die_routine = die_builtin;
+static report_fn bug_routine = bug_builtin;
 static report_fn error_routine = error_builtin;
 static report_fn warn_routine = warn_builtin;
 static int (*die_is_recursing)(void) = die_is_recursing_builtin;
@@ -118,11 +126,22 @@  void set_die_routine(NORETURN_PTR report_fn routine)
 	die_routine = routine;
 }
 
+
+void set_bug_routine(report_fn routine)
+{
+	bug_routine = routine;
+}
+
 void set_error_routine(report_fn routine)
 {
 	error_routine = routine;
 }
 
+report_fn get_bug_routine(void)
+{
+	return bug_routine;
+}
+
 report_fn get_error_routine(void)
 {
 	return error_routine;
@@ -223,6 +242,16 @@  int error_errno(const char *fmt, ...)
 	return -1;
 }
 
+int bug(const char *err, ...)
+{
+	va_list params;
+
+	va_start(params, err);
+	bug_routine(err, params);
+	va_end(params);
+	return -1;
+}
+
 #undef error
 int error(const char *err, ...)
 {