diff mbox series

[3/6] test-lib: rely on logs to detect leaks

Message ID 20250101201444.GC3305462@coredump.intra.peff.net (mailing list archive)
State Accepted
Commit 5fa0c4dd296d3731bbbd1977d7bf9c50d8c4b7c1
Headers show
Series [1/6] test-lib: use individual lsan dir for --stress runs | expand

Commit Message

Jeff King Jan. 1, 2025, 8:14 p.m. UTC
When we run with sanitizers, we set abort_on_error=1 so that the tests
themselves can detect problems directly (when the buggy program exits
with SIGABRT). This has one blind spot, though: we don't always check
the exit codes for all programs (e.g., helpers like upload-pack invoked
behind the scenes).

For ASan and UBSan this is mostly fine; they exit as soon as they see an
error, so the unexpected abort of the program causes the test to fail
anyway.

But for LSan, the program runs to completion, since we can only check
for leaks at the end. And in that case we could miss leak reports. And
thus we started checking LSan logs in faececa53f (test-lib: have the
"check" mode for SANITIZE=leak consider leak logs, 2022-07-28).
Originally the logs were optional, but logs are generated (and checked)
always as of 8c1d6691bc (test-lib: GIT_TEST_SANITIZE_LEAK_LOG enabled by
default, 2024-07-11). And we even check them for each test snippet, as
of cf1464331b (test-lib: check for leak logs after every test,
2024-09-24).

So now aborting on error is superfluous for LSan! We can get everything
we need by checking the logs. And checking the logs is actually
preferable, since it gives us more control over silencing false
positives (something we do not yet do, but will soon).

So let's tell LSan to just exit normally, even if it finds leaks. We can
do so with exitcode=0, which also suppresses the abort_on_error flag.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/test-lib.sh | 1 +
 1 file changed, 1 insertion(+)

Comments

Patrick Steinhardt Jan. 3, 2025, 12:05 p.m. UTC | #1
On Wed, Jan 01, 2025 at 03:14:44PM -0500, Jeff King wrote:
> When we run with sanitizers, we set abort_on_error=1 so that the tests
> themselves can detect problems directly (when the buggy program exits
> with SIGABRT). This has one blind spot, though: we don't always check
> the exit codes for all programs (e.g., helpers like upload-pack invoked
> behind the scenes).
> 
> For ASan and UBSan this is mostly fine; they exit as soon as they see an
> error, so the unexpected abort of the program causes the test to fail
> anyway.
> 
> But for LSan, the program runs to completion, since we can only check
> for leaks at the end. And in that case we could miss leak reports. And
> thus we started checking LSan logs in faececa53f (test-lib: have the
> "check" mode for SANITIZE=leak consider leak logs, 2022-07-28).
> Originally the logs were optional, but logs are generated (and checked)
> always as of 8c1d6691bc (test-lib: GIT_TEST_SANITIZE_LEAK_LOG enabled by
> default, 2024-07-11). And we even check them for each test snippet, as
> of cf1464331b (test-lib: check for leak logs after every test,
> 2024-09-24).
> 
> So now aborting on error is superfluous for LSan! We can get everything
> we need by checking the logs. And checking the logs is actually
> preferable, since it gives us more control over silencing false
> positives (something we do not yet do, but will soon).
> 
> So let's tell LSan to just exit normally, even if it finds leaks. We can
> do so with exitcode=0, which also suppresses the abort_on_error flag.

The only downside I can think of is that we now run the whole testcase
to completion before checking for leaks, whereas beforehand we most
likely aborted the testcase on hitting the first leak. It follows that
we may now have multiple leak reports, and it is not immediately clear
which of the commands has actually been failing.

I think we're now in a clean-enough state regarding memory leaks that
this isn't a huge issue anymore though.

Patrick
Jeff King Jan. 3, 2025, 8:10 p.m. UTC | #2
On Fri, Jan 03, 2025 at 01:05:41PM +0100, Patrick Steinhardt wrote:

> > So now aborting on error is superfluous for LSan! We can get everything
> > we need by checking the logs. And checking the logs is actually
> > preferable, since it gives us more control over silencing false
> > positives (something we do not yet do, but will soon).
> > 
> > So let's tell LSan to just exit normally, even if it finds leaks. We can
> > do so with exitcode=0, which also suppresses the abort_on_error flag.
> 
> The only downside I can think of is that we now run the whole testcase
> to completion before checking for leaks, whereas beforehand we most
> likely aborted the testcase on hitting the first leak. It follows that
> we may now have multiple leak reports, and it is not immediately clear
> which of the commands has actually been failing.

True, I didn't think of that. We do at least check the logs after each
case, so it would have to be multiple leaks in the same snippet. The
LSan output also mentions the process name, though not the arguments
(and some snippets may invoke the same program multiple times).

The other thing I guess we'd miss is that SIGABRT can optionally produce
a core dump. I don't think I've ever found that useful for LSan (since
it's not exiting until the end of process) but occasionally have used it
for ASan (though this patch does not change anything there).

> I think we're now in a clean-enough state regarding memory leaks that
> this isn't a huge issue anymore though.

Yeah, agreed. We can see how much of a problem it is in practice. You
can always switch the flag yourself for a specific run, like:

  LSAN_OPTIONS=exitcode=23 ./twhatever -i

-Peff
diff mbox series

Patch

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 96f2dfb69d..dd2ba6e6cc 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -80,6 +80,7 @@  prepend_var ASAN_OPTIONS : detect_leaks=0
 export ASAN_OPTIONS
 
 prepend_var LSAN_OPTIONS : $GIT_SAN_OPTIONS
+prepend_var LSAN_OPTIONS : exitcode=0
 prepend_var LSAN_OPTIONS : fast_unwind_on_malloc=0
 export LSAN_OPTIONS