Message ID | patch-1.1-9190f3c128f-20211022T102725Z-avarab@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | leak tests: add an interface to the LSAN_OPTIONS "suppressions" | expand |
On Fri, Oct 22, 2021 at 12:32:17PM +0200, Ævar Arnfjörð Bjarmason wrote: > Extend the SANITIZE=leak testing mode added in 956d2e4639b (tests: add > a test mode for SANITIZE=leak, run it in CI, 2021-09-23) to optionally > be able to add a "suppressions" file to the $LSAN_OPTIONS. > > This allows for marking tests as passing with > "TEST_PASSES_SANITIZE_LEAK=true" when they still have failure due more > general widespread memory leaks, such as from the "git log" family of > commands. We can now mark the "git -C" tests as passing. > > For getting specific tests to pass this is preferable to using > UNLEAK() in these codepaths, as I'll have fixes for those leaks soon, > and being able to atomically mark relevant tests as passing with > "TEST_PASSES_SANITIZE_LEAK=true" helps to explain those changes. See > [1] for more details. > > 1. https://lore.kernel.org/git/211022.86sfwtl6uj.gmgdl@evledraar.gmail.com/ > > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> > --- > > On Fri, Oct 22 2021, Ævar Arnfjörð Bjarmason wrote: > > > On Fri, Oct 22 2021, Taylor Blau wrote: > > > >> On Thu, Oct 21, 2021 at 01:50:55PM +0200, Ævar Arnfjörð Bjarmason wrote: > >>> > >>> On Wed, Oct 20 2021, Taylor Blau wrote: > [...] > > If you want to pick that approach up and run with it I think it would > > probably make sense to factor that suggested test_expect_success out > > into a function in test-lib-functions.sh or whatever, and call it as > > e.g.: > > > > TEST_PASSES_SANITIZE_LEAK=true > > . ./test-lib.sh > > declare_known_leaks <<-\EOF > > add_rev_cmdline > > [...] > > EOF > > > > Then pipe it through sed 's/^/leak:/' and have it set LSAN_OPTIONS for > > you. > > > > Doing it that way would be less boilerplate for each test that wants it, > > and is also more likely to work with other non-LSAN leak appoaches, > > i.e. as long as something can take a list of lines matching stack traces > > we can feed that to that leak checker's idea of a whitelist. > > I just went ahead and hacked that up. If you're OK with that approach > it would really help reduce the work for leak changes I've got > planned, and as noted gives you the end-state of a passing 5319. > > I don't know if it makes more sense for you to base on top of this > if/when Junio picks it up, or to integrate it into your series > etc. Maybe Junio will chime in ... Hmm. This seems neat, but I haven't been able to get it to work without encountering a rather annoying bug along the way. Here is the preamble of my modified t5319 to include all of the leaks I found in the previous round (but decided not to fix): TEST_PASSES_SANITIZE_LEAK=true . ./test-lib.sh todo_leaks <<-\EOF ^add_rev_cmdline$ ^cmd_log_init_finish$ ^prepare_revision_walk$ ^prep_parse_options$ EOF So running that when git is compiled with SANITIZE=leak *should* work, but instead produces this rather annoying leak detection after t5319.7: ================================================================= ==1785298==ERROR: LeakSanitizer: detected memory leaks Indirect leak of 41 byte(s) in 3 object(s) allocated from: #0 0x7f2f2f866db0 in __interceptor_malloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:54 #1 0x7f2f2f64b4aa in __GI___strdup string/strdup.c:42 ----------------------------------------------------- Suppressions used: count bytes template 1 576 ^add_rev_cmdline$ ----------------------------------------------------- SUMMARY: LeakSanitizer: 41 byte(s) leaked in 3 allocation(s). Aborted Huh? Looking past __GI___strdup (which I assume is just a symbolification failure on ASan's part), who calls strdup()? That trace is annoyingly incomplete, and doesn't really give us anything to go off of. It does seem to remind me of this: https://github.com/google/sanitizers/issues/148 though that issue goes in so many different directions that I'm not sure whether it really is the same issue or not. In any case, that leak *still* shows up even when suppressing xmalloc() and xrealloc(), so I almost think that it's a leak within ASan itself. But most interesting is that those calls go away when I stop setting `suppressions` in $LSAN_OPTIONS by dropping the call to your todo_leaks. So this all feels like a bug in ASan to me. I'm curious if it works on your system, but in the meantime I think the best path forward is to drop the last patch of my original series (the one with the three UNLEAK() calls) and to avoid relying on this patch for the time being. Thanks, Taylor
On Tue, Oct 26, 2021 at 04:23:14PM -0400, Taylor Blau wrote: > So this all feels like a bug in ASan to me. I'm curious if it works on > your system, but in the meantime I think the best path forward is to > drop the last patch of my original series (the one with the three > UNLEAK() calls) and to avoid relying on this patch for the time being. Bugs aside, I'd much rather see UNLEAK() annotations than external ones, for all the reasons we introduced UNLEAK() in the first place: - it keeps the annotations near the code. Yes, that creates conflicts when the code is changed (or the leak is actually fixed), but that's a feature. It keeps them from going stale. - leak-checkers only know where things are allocated, not who is supposed to own them. So you're often annotating the wrong thing; it's not a strdup() call which is buggy and leaking, but the function five layers up the stack which was supposed to take ownership and didn't. And if we avoid any annotation bugs by doing so, that's icing on the cake. :) -Peff
On Tue, Oct 26, 2021 at 05:11:29PM -0400, Jeff King wrote: > On Tue, Oct 26, 2021 at 04:23:14PM -0400, Taylor Blau wrote: > > > So this all feels like a bug in ASan to me. I'm curious if it works on > > your system, but in the meantime I think the best path forward is to > > drop the last patch of my original series (the one with the three > > UNLEAK() calls) and to avoid relying on this patch for the time being. > > Bugs aside, I'd much rather see UNLEAK() annotations than external ones, > for all the reasons we introduced UNLEAK() in the first place: > > - it keeps the annotations near the code. Yes, that creates conflicts > when the code is changed (or the leak is actually fixed), but that's > a feature. It keeps them from going stale. I agree completely. I noted as much in my message here: https://lore.kernel.org/git/YXJAfICQN8s5Gm7s@nand.local/ but Ævar made it sound like his work would be made much easier without the conflict. Since I'm not in any kind of rush to make t5319 leak-free, I figured that queueing the parts of that series that wouldn't conflict with Ævar's ongoing work would be a net-positive. > - leak-checkers only know where things are allocated, not who is > supposed to own them. So you're often annotating the wrong thing; > it's not a strdup() call which is buggy and leaking, but the > function five layers up the stack which was supposed to take > ownership and didn't. TBH, I find this much more compelling. Either way, I don't feel strongly enough to deviate from v2 much, and I don't want to create more work for Ævar along the way. Thanks, Taylor
On Tue, Oct 26, 2021 at 05:30:47PM -0400, Taylor Blau wrote: > > Bugs aside, I'd much rather see UNLEAK() annotations than external ones, > > for all the reasons we introduced UNLEAK() in the first place: > > > > - it keeps the annotations near the code. Yes, that creates conflicts > > when the code is changed (or the leak is actually fixed), but that's > > a feature. It keeps them from going stale. > > I agree completely. I noted as much in my message here: > > https://lore.kernel.org/git/YXJAfICQN8s5Gm7s@nand.local/ > > but Ævar made it sound like his work would be made much easier without > the conflict. Since I'm not in any kind of rush to make t5319 leak-free, > I figured that queueing the parts of that series that wouldn't conflict > with Ævar's ongoing work would be a net-positive. Yeah, to be clear, if there's work in progress in an area, then _not_ annotating it (with either method) is perfectly fine with me in the meantime. -Peff
On Tue, Oct 26 2021, Taylor Blau wrote: > On Fri, Oct 22, 2021 at 12:32:17PM +0200, Ævar Arnfjörð Bjarmason wrote: >> Extend the SANITIZE=leak testing mode added in 956d2e4639b (tests: add >> a test mode for SANITIZE=leak, run it in CI, 2021-09-23) to optionally >> be able to add a "suppressions" file to the $LSAN_OPTIONS. >> >> This allows for marking tests as passing with >> "TEST_PASSES_SANITIZE_LEAK=true" when they still have failure due more >> general widespread memory leaks, such as from the "git log" family of >> commands. We can now mark the "git -C" tests as passing. >> >> For getting specific tests to pass this is preferable to using >> UNLEAK() in these codepaths, as I'll have fixes for those leaks soon, >> and being able to atomically mark relevant tests as passing with >> "TEST_PASSES_SANITIZE_LEAK=true" helps to explain those changes. See >> [1] for more details. >> >> 1. https://lore.kernel.org/git/211022.86sfwtl6uj.gmgdl@evledraar.gmail.com/ >> >> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> >> --- >> >> On Fri, Oct 22 2021, Ævar Arnfjörð Bjarmason wrote: >> >> > On Fri, Oct 22 2021, Taylor Blau wrote: >> > >> >> On Thu, Oct 21, 2021 at 01:50:55PM +0200, Ævar Arnfjörð Bjarmason wrote: >> >>> >> >>> On Wed, Oct 20 2021, Taylor Blau wrote: >> [...] >> > If you want to pick that approach up and run with it I think it would >> > probably make sense to factor that suggested test_expect_success out >> > into a function in test-lib-functions.sh or whatever, and call it as >> > e.g.: >> > >> > TEST_PASSES_SANITIZE_LEAK=true >> > . ./test-lib.sh >> > declare_known_leaks <<-\EOF >> > add_rev_cmdline >> > [...] >> > EOF >> > >> > Then pipe it through sed 's/^/leak:/' and have it set LSAN_OPTIONS for >> > you. >> > >> > Doing it that way would be less boilerplate for each test that wants it, >> > and is also more likely to work with other non-LSAN leak appoaches, >> > i.e. as long as something can take a list of lines matching stack traces >> > we can feed that to that leak checker's idea of a whitelist. >> >> I just went ahead and hacked that up. If you're OK with that approach >> it would really help reduce the work for leak changes I've got >> planned, and as noted gives you the end-state of a passing 5319. >> >> I don't know if it makes more sense for you to base on top of this >> if/when Junio picks it up, or to integrate it into your series >> etc. Maybe Junio will chime in ... > > Hmm. This seems neat, but I haven't been able to get it to work without > encountering a rather annoying bug along the way. > > Here is the preamble of my modified t5319 to include all of the leaks I > found in the previous round (but decided not to fix): > > TEST_PASSES_SANITIZE_LEAK=true > . ./test-lib.sh > todo_leaks <<-\EOF > ^add_rev_cmdline$ > ^cmd_log_init_finish$ > ^prepare_revision_walk$ > ^prep_parse_options$ > EOF > > So running that when git is compiled with SANITIZE=leak *should* work, > but instead produces this rather annoying leak detection after t5319.7: > > ================================================================= > ==1785298==ERROR: LeakSanitizer: detected memory leaks > > Indirect leak of 41 byte(s) in 3 object(s) allocated from: > #0 0x7f2f2f866db0 in __interceptor_malloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:54 > #1 0x7f2f2f64b4aa in __GI___strdup string/strdup.c:42 > > ----------------------------------------------------- > Suppressions used: > count bytes template > 1 576 ^add_rev_cmdline$ > ----------------------------------------------------- > > SUMMARY: LeakSanitizer: 41 byte(s) leaked in 3 allocation(s). > Aborted > > Huh? Looking past __GI___strdup (which I assume is just a > symbolification failure on ASan's part), who calls strdup()? That trace > is annoyingly incomplete, and doesn't really give us anything to go off > of. > > It does seem to remind me of this: > > https://github.com/google/sanitizers/issues/148 > > though that issue goes in so many different directions that I'm not sure > whether it really is the same issue or not. In any case, that leak > *still* shows up even when suppressing xmalloc() and xrealloc(), so I > almost think that it's a leak within ASan itself. > > But most interesting is that those calls go away when I stop setting > `suppressions` in $LSAN_OPTIONS by dropping the call to your todo_leaks. > > So this all feels like a bug in ASan to me. I'm curious if it works on > your system, but in the meantime I think the best path forward is to > drop the last patch of my original series (the one with the three > UNLEAK() calls) and to avoid relying on this patch for the time being. There are similar cases where LSAN doesn't provide as meaningful of a stacktrace as valgrind, sometimes when tracing leaks I get a relatively bad stacktrace like that ending in some __GI___*<stdlib-name> function. I'll usually compile without SANITIZE=leak and just run a slower: valgrind --leak-check=full --show-leak-kinds=all In this case however it's not a bug or bad leak tracing, but an artifact of us using these stacktrace exclusions. If you run this manually you'll see: $ ~/g/git/git -c core.multiPackIndex=false rev-list --objects --all cd0747a9352b58d112f0010134351efc7bbad4a6 [... snipped output ...] ================================================================= ==58023==ERROR: LeakSanitizer: detected memory leaks Direct leak of 576 byte(s) in 1 object(s) allocated from: #0 0x7fd0bfae50c5 in __interceptor_realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:82 #1 0x5637b3bd9163 in xrealloc /home/avar/g/git/wrapper.c:126 #2 0x5637b3b6a1e4 in add_rev_cmdline /home/avar/g/git/revision.c:1482 #3 0x5637b3b6a412 in handle_one_ref /home/avar/g/git/revision.c:1534 #4 0x5637b3b49c4e in do_for_each_ref_helper /home/avar/g/git/refs.c:1483 #5 0x5637b3b54afc in do_for_each_repo_ref_iterator refs/iterator.c:418 #6 0x5637b3b49cc8 in do_for_each_ref /home/avar/g/git/refs.c:1498 #7 0x5637b3b49d07 in refs_for_each_ref /home/avar/g/git/refs.c:1504 #8 0x5637b3b6a563 in handle_refs /home/avar/g/git/revision.c:1578 #9 0x5637b3b6e0e7 in handle_revision_pseudo_opt /home/avar/g/git/revision.c:2597 #10 0x5637b3b6e9d5 in setup_revisions /home/avar/g/git/revision.c:2738 #11 0x5637b39ebc58 in cmd_rev_list builtin/rev-list.c:550 #12 0x5637b3932a89 in run_builtin /home/avar/g/git/git.c:461 #13 0x5637b3932e98 in handle_builtin /home/avar/g/git/git.c:713 #14 0x5637b3933105 in run_argv /home/avar/g/git/git.c:780 #15 0x5637b39335ae in cmd_main /home/avar/g/git/git.c:911 #16 0x5637b3a1a898 in main /home/avar/g/git/common-main.c:52 #17 0x7fd0bf860d09 in __libc_start_main ../csu/libc-start.c:308 Indirect leak of 41 byte(s) in 3 object(s) allocated from: #0 0x7fd0bfae4db0 in __interceptor_malloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:54 #1 0x7fd0bf8c7e4a in __GI___strdup string/strdup.c:42 SUMMARY: LeakSanitizer: 617 byte(s) leaked in 4 allocation(s). Which is why I put the "strdup" there, but forgot to tell you when I submitted the real PATCH version that that one shouldn't have the ^$ anchoring. FWIW this will work on top of your series: diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index a3c72b68f7c..891de720b2c 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -2,6 +2,17 @@ test_description='multi-pack-indexes' . ./test-lib.sh +todo_leaks <<-\EOF +^add_rev_cmdline$ +^cmd_log_init_finish$ +^prep_parse_options$ +^prepare_revision_walk$ +^start_progress_delay$ + +# An indirect leak reported because we excluded the leaks above +strdup$ +EOF + GIT_TEST_MULTI_PACK_INDEX=0 objdir=.git/objects
On Tue, Oct 26 2021, Jeff King wrote: It seems that I've gotten what I wanted from this side-thread, yay! Thanks both, not to pile on, but just to clarify: > On Tue, Oct 26, 2021 at 04:23:14PM -0400, Taylor Blau wrote: > >> So this all feels like a bug in ASan to me. I'm curious if it works on >> your system, but in the meantime I think the best path forward is to >> drop the last patch of my original series (the one with the three >> UNLEAK() calls) and to avoid relying on this patch for the time being. > > Bugs aside, I'd much rather see UNLEAK() annotations than external ones, > for all the reasons we introduced UNLEAK() in the first place: > > - it keeps the annotations near the code. Yes, that creates conflicts > when the code is changed (or the leak is actually fixed), but that's > a feature. It keeps them from going stale. I fully agree with you in general for "good" uses of UNLEAK(). E.g. consider: struct strbuf buf = xmalloc(...); [...] UNLEAK(buf); If I was fixing leaks in that sort of code what I pointed out downthread in [1] wouldn't apply. So to clarify, I'm not asking in [1] that UNLEAK() not be used at all while we have in-flight leak fixes. I.e. I'd run into a textual conflict, but that would be trivial to resolve, and obvious what the semantic & textual conflict was. Rather, it's not marking specific leaks, but UNLEAK() on a container struct that's problematic. Depending on how they're used those structs may or may not leak. So e.g. Taylor's upthread 11/11[2] contained: diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 36cb909eba..df3811e763 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -549,6 +549,8 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix) argc = setup_revisions(argc, argv, &revs, &s_r_opt); + UNLEAK(revs); + memset(&info, 0, sizeof(info)); info.revs = &revs; if (revs.bisect) Which if you on master replace that with: diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 36cb909ebaa..3cce87e0eb7 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -548,6 +548,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix) revs.do_not_die_on_missing_tree = 1; argc = setup_revisions(argc, argv, &revs, &s_r_opt); + BUG("using this?"); memset(&info, 0, sizeof(info)); info.revs = &revs; You'll run into a failure with: GIT_TEST_PASSING_SANITIZE_LEAK=true ./t0002-gitfile.sh I.e. our existing leak tests already take that codepath and don't run into a leak from using "revs". So we wouldn't just be marking a known specific leak, but hiding all leaks & non-leaks in container from the top-level, and thus hide potential regressions, an addition to attaining the end-goal of marking some specific test as passing. Now, that specific example isn't very useful right now, since we don't have a release_revisions() at all, but e.g. the first batch of fixes I've got for the revisions.[ch] leak fix most common cases (e.g. the pending array), but not some obscure ones. Being able to incrementally mark those leaks as fixed on a test-by-test basis has the advantage over UNLEAK() that we won't have regressions while we're not at a 100% leak free (at which point we could remove the UNLEAK()). > - leak-checkers only know where things are allocated, not who is > supposed to own them. So you're often annotating the wrong thing; > it's not a strdup() call which is buggy and leaking, but the > function five layers up the stack which was supposed to take > ownership and didn't. As noted in [3] this case is because the LSAN suppressions list was in play, so we excluded the meaningful part of the stack trace (which is shown in that mail). Hrm, now that I think about it I think that the cases where I had to resort to valgrind to get around such crappy stacktraces (when that was all I got, even without suppressions) were probably all because there was an UNLEAK() in play... 1. https://lore.kernel.org/git/211022.86sfwtl6uj.gmgdl@evledraar.gmail.com/ 2. https://lore.kernel.org/git/f1bb8b73ffdb78995dc5653791f9a64adf216e21.1634787555.git.me@ttaylorr.com/ 3. https://lore.kernel.org/git/211027.86zgquyk52.gmgdl@evledraar.gmail.com/
On Wed, Oct 27, 2021 at 10:04:17AM +0200, Ævar Arnfjörð Bjarmason wrote: > > - it keeps the annotations near the code. Yes, that creates conflicts > > when the code is changed (or the leak is actually fixed), but that's > > a feature. It keeps them from going stale. > > I fully agree with you in general for "good" uses of UNLEAK(). E.g. consider: > > struct strbuf buf = xmalloc(...); > [...] > UNLEAK(buf); > > If I was fixing leaks in that sort of code what I pointed out downthread > in [1] wouldn't apply. I'm OK with such a use of UNLEAK(), though of course you could probably just free the buffer in that instance. But... > So to clarify, I'm not asking in [1] that UNLEAK() not be used at all > while we have in-flight leak fixes. I.e. I'd run into a textual > conflict, but that would be trivial to resolve, and obvious what the > semantic & textual conflict was. > > Rather, it's not marking specific leaks, but UNLEAK() on a container > struct that's problematic. ...that's the whole point of UNLEAK(), IMHO. You know that the container struct is leaked, but you don't care because the program is about to exit, or there's no easy way to avoid the leak. So it's not the "container" element, but rather it can be a problem if people annotate too broadly (you will miss some leaks). In the case of rev_info, there is no way to _not_ leak right now, because it has no cleanup function. In the long run we should probably add one. But in the meantime, annotating them so we can find the _interesting_ leaks is worth doing (and that was the whole point of adding UNLEAK in the first place. And I think the UNLEAK() calls in Taylor's patch are fine in that sense. While yes, _some_ runs of those commands will not leak, the point is to say that when they do leak, they are not interesting. And they are not interesting because that rev_info is in scope until the program exits, so anything it points to is only leaked at a moment where we no longer care. > So we wouldn't just be marking a known specific leak, but hiding all > leaks & non-leaks in container from the top-level, and thus hide > potential regressions, an addition to attaining the end-goal of marking > some specific test as passing. I'd argue it _is_ a known specific leak. It is rev_info going out of scope that causes the leak, not rev_info holding on to memory in things like its pending array. Now those can be interesting, too (if it no longer needs the memory, can it perhaps get rid of it). But IMHO all of that is pretty secondary to clearing the noise so you can find "true" leaks (ones where some sub-function really is allocating and then losing the pointer entirely, especially if it's called an arbitrary number of times). > > - leak-checkers only know where things are allocated, not who is > > supposed to own them. So you're often annotating the wrong thing; > > it's not a strdup() call which is buggy and leaking, but the > > function five layers up the stack which was supposed to take > > ownership and didn't. > > As noted in [3] this case is because the LSAN suppressions list was in > play, so we excluded the meaningful part of the stack trace (which is > shown in that mail). I don't think that's true at all. The annotations you showed in that message, for example, are pointing at add_rev_cmdline(). But that is _not_ the source of the leak, nor where it should be fixed. It is a necessary part of how rev_info works. The leak is when the rev_info goes out of scope without anybody cleaning it up. > Hrm, now that I think about it I think that the cases where I had to > resort to valgrind to get around such crappy stacktraces (when that was > all I got, even without suppressions) were probably all because there > was an UNLEAK() in play... I don't see how UNLEAK() would impact stack traces. It should either make something not-leaked-at-all (in which case LSan will no longer mention it), or it does nothing (it throws some wasted memory into a structure which is itself not leaked). -Peff
Jeff King <peff@peff.net> writes: > I don't think that's true at all. The annotations you showed in that > message, for example, are pointing at add_rev_cmdline(). But that is > _not_ the source of the leak, nor where it should be fixed. It is a > necessary part of how rev_info works. The leak is when the rev_info goes > out of scope without anybody cleaning it up. True. >> Hrm, now that I think about it I think that the cases where I had to >> resort to valgrind to get around such crappy stacktraces (when that was >> all I got, even without suppressions) were probably all because there >> was an UNLEAK() in play... > > I don't see how UNLEAK() would impact stack traces. It should either > make something not-leaked-at-all (in which case LSan will no longer > mention it), or it does nothing (it throws some wasted memory into a > structure which is itself not leaked). I'd prefer to see judicious use of UNLEAK() over LSAN suppressions that lists things that are not the source of the problem. The list being kept in the tests that are far away from the actual source does not help, either. Thanks.
On Wed, Oct 27 2021, Jeff King wrote: > On Wed, Oct 27, 2021 at 10:04:17AM +0200, Ævar Arnfjörð Bjarmason wrote: > >> > - it keeps the annotations near the code. Yes, that creates conflicts >> > when the code is changed (or the leak is actually fixed), but that's >> > a feature. It keeps them from going stale. >> >> I fully agree with you in general for "good" uses of UNLEAK(). E.g. consider: >> >> struct strbuf buf = xmalloc(...); >> [...] >> UNLEAK(buf); >> >> If I was fixing leaks in that sort of code what I pointed out downthread >> in [1] wouldn't apply. > > I'm OK with such a use of UNLEAK(), though of course you could probably > just free the buffer in that instance. > > But... > >> So to clarify, I'm not asking in [1] that UNLEAK() not be used at all >> while we have in-flight leak fixes. I.e. I'd run into a textual >> conflict, but that would be trivial to resolve, and obvious what the >> semantic & textual conflict was. >> >> Rather, it's not marking specific leaks, but UNLEAK() on a container >> struct that's problematic. > > ...that's the whole point of UNLEAK(), IMHO. You know that the container > struct is leaked, but you don't care because the program is about to > exit, or there's no easy way to avoid the leak. > > So it's not the "container" element, but rather it can be a problem if > people annotate too broadly (you will miss some leaks). In the case of > rev_info, there is no way to _not_ leak right now, because it has no > cleanup function. It doesn't have one, but there are uses of setup_revisions() and rev_info usage that don't leak, as that builtin/rev-list.c case shows. I mean, in that case it's not doing much of anything, but at least we test that setup_revisions() itself doesn't leak right now, but wouldn't with UNLEAK(). > In the long run we should probably add one. After the current in-flight patches I've got making diff_free() a bit more useful, and then a revisions_release() that'll handle most cases, which gets us started on mass-marking the tests as leak free. So just FWIW I'm not saying "hey can we hold off on that UNLEAK() for far future xyz", but for a thing I've got queued up that I'd rather not start rewriting... > [...]But in the meantime, annotating them so we can find the > _interesting_ leaks is worth doing (and that was the whole point of > adding UNLEAK in the first place. > > And I think the UNLEAK() calls in Taylor's patch are fine in that sense. > While yes, _some_ runs of those commands will not leak, the point is to > say that when they do leak, they are not interesting. And they are not > interesting because that rev_info is in scope until the program exits, > so anything it points to is only leaked at a moment where we no longer > care. I don't per-se care about leaks at program exit either, but am making them leak-free as a proxy for making the relevant APIs useful for persistent use. So I think the historical approach of just sprinkling UNLEAK() since we're exiting anyway is taking too narrow of a view. In that case if we'd like to see if revisions.[ch] is leak-free we'd need parallel test-helper tests (or whatever) for all its features, or we can just piggy-back on every "git log" invocation in the test suite... >> So we wouldn't just be marking a known specific leak, but hiding all >> leaks & non-leaks in container from the top-level, and thus hide >> potential regressions, an addition to attaining the end-goal of marking >> some specific test as passing. > > I'd argue it _is_ a known specific leak. It is rev_info going out of > scope that causes the leak, not rev_info holding on to memory in things > like its pending array. > > Now those can be interesting, too (if it no longer needs the memory, can > it perhaps get rid of it). But IMHO all of that is pretty secondary to > clearing the noise so you can find "true" leaks (ones where some > sub-function really is allocating and then losing the pointer entirely, > especially if it's called an arbitrary number of times). I wasn't trying to get into the philosophical discussion of whether a struct or a struct it contains can be said to be the "real" source of a leak :) Yeah, I agree that from that POV there's no difference. I'm merely referring to the practical aspect of patches I've got queued up, which do things like fix all leaks of rev_info.pending, mark tests that now pass, fix another rev_info.whatever, mark tests etc. For rev_info in particular it's a bit iffy to UNLEAK() it, since it also contains pointers to stuff it doesn't "own", sometimes things are allocated and handed off to it, sometimes for a while, sometimes permanently. So by UNLEAK()ing it you're not just unleaking "its" memory, but potentially hiding leaks in APIs that need to interact with it in that way. E.g. mailmap.c is one of those cases. >> > - leak-checkers only know where things are allocated, not who is >> > supposed to own them. So you're often annotating the wrong thing; >> > it's not a strdup() call which is buggy and leaking, but the >> > function five layers up the stack which was supposed to take >> > ownership and didn't. >> >> As noted in [3] this case is because the LSAN suppressions list was in >> play, so we excluded the meaningful part of the stack trace (which is >> shown in that mail). > > I don't think that's true at all. The annotations you showed in that > message, for example, are pointing at add_rev_cmdline(). But that is > _not_ the source of the leak, nor where it should be fixed. It is a > necessary part of how rev_info works. The leak is when the rev_info goes > out of scope without anybody cleaning it up. > >> Hrm, now that I think about it I think that the cases where I had to >> resort to valgrind to get around such crappy stacktraces (when that was >> all I got, even without suppressions) were probably all because there >> was an UNLEAK() in play... > > I don't see how UNLEAK() would impact stack traces. It should either > make something not-leaked-at-all (in which case LSan will no longer > mention it), or it does nothing (it throws some wasted memory into a > structure which is itself not leaked). Yes, I think either categorically wrong here, or it applies to some other case I wasn't able to dig up. Or maybe not, doesn't Taylor's example take it from "Direct leak" to "Indirect leak" with the suppression in play? I think those were related somehow (but don't have that in front of me as I type this out). E.g. (to reinforce your point) try compiling with SANITIZE=leak and running: $ TZ=UTC t/helper/test-tool date show:format:%z 1466000000 +0200 1466000000 -> +0000 +0200 -> +0000 ================================================================= ==335188==ERROR: LeakSanitizer: detected memory leaks Direct leak of 3 byte(s) in 1 object(s) allocated from: #0 0x7f31cdd21db0 in __interceptor_malloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:54 #1 0x7f31cdb04e4a in __GI___strdup string/strdup.c:42 SUMMARY: LeakSanitizer: 3 byte(s) leaked in 1 allocation(s). Now try with "SANITIZE=" and valgrind can give you the "real" source: $ TZ=UTC valgrind --leak-check=full t/helper/test-tool date show:format:%z 1466000000 +0200 ==336515== Memcheck, a memory error detector ==336515== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==336515== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info ==336515== Command: t/helper/test-tool date show:format:%z 1466000000 +0200 ==336515== 1466000000 -> +0000 +0200 -> +0000 ==336515== ==336515== HEAP SUMMARY: ==336515== in use at exit: 1,208 bytes in 13 blocks ==336515== total heap usage: 50 allocs, 37 frees, 27,593 bytes allocated ==336515== ==336515== 3 bytes in 1 blocks are definitely lost in loss record 1 of 13 ==336515== at 0x483877F: malloc (vg_replace_malloc.c:307) ==336515== by 0x49C6E4A: strdup (strdup.c:42) ==336515== by 0x261E3A: xstrdup (wrapper.c:29) ==336515== by 0x152785: parse_date_format (date.c:991) ==336515== by 0x116A34: show_dates (test-date.c:39) ==336515== by 0x116DE9: cmd__date (test-date.c:116) ==336515== by 0x115227: cmd_main (test-tool.c:127) ==336515== by 0x115334: main (common-main.c:52) ==336515== ==336515== LEAK SUMMARY: ==336515== definitely lost: 3 bytes in 1 blocks ==336515== indirectly lost: 0 bytes in 0 blocks ==336515== possibly lost: 0 bytes in 0 blocks ==336515== still reachable: 1,205 bytes in 12 blocks ==336515== suppressed: 0 bytes in 0 blocks ==336515== Reachable blocks (those to which a pointer was found) are not shown. ==336515== To see them, rerun with: --leak-check=full --show-leak-kinds=all ==336515== ==336515== For lists of detected and suppressed errors, rerun with: -s ==336515== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) I don't know why it habitually loses track of seemingly easy-to-track leaks sometimes, but it does. There isn't an UNLEAK() in play here. When I run into these I try it with valgrind, fix the leak, then test again with SANITIZE=leak. I haven't found cases where it actually misses things, just cases where the stack traces suck.
On Wed, Oct 27, 2021 at 10:57:52PM +0200, Ævar Arnfjörð Bjarmason wrote: > > So it's not the "container" element, but rather it can be a problem if > > people annotate too broadly (you will miss some leaks). In the case of > > rev_info, there is no way to _not_ leak right now, because it has no > > cleanup function. > > It doesn't have one, but there are uses of setup_revisions() and > rev_info usage that don't leak, as that builtin/rev-list.c case shows. > > I mean, in that case it's not doing much of anything, but at least we > test that setup_revisions() itself doesn't leak right now, but wouldn't > with UNLEAK(). I don't think that's true. If you UNLEAK() the rev_info in the caller, then it will only affect allocations that are still reachable from rev_info. I.e., things that are by definition not a leak in setup_revisions(). Now you could argue that setup_revisions() is "leaking" by allocating things and stuffing them into rev_info that it should not be. But we can never know that until we have an actual function that cleans up a rev_info, which defines what it's "supposed" to have ownership of. Maybe we have callers that explicitly try to de-allocate bits of the rev_info. But IMHO that is the source of the whole problem: how is random code using rev_info supposed to know which of its internal details are owned or not? This should be documented and enforced with a single function. > So just FWIW I'm not saying "hey can we hold off on that UNLEAK() for > far future xyz", but for a thing I've got queued up that I'd rather not > start rewriting... Just to be clear: I am totally fine with dropping Taylor's UNLEAK patches (as I've said already). I was only arguing here that annotating via external files is worse than just adding an UNLEAK(). I'm also trying to combat what I see as mis-conceptions or inaccuracies about what UNLEAK() does or its implications (or even what counts as a "leak"). But I hope in the long run that we don't need _any_ kind of annotation, because we'll actually be leak-free. And then we don't have to care about any of this. > > I don't see how UNLEAK() would impact stack traces. It should either > > make something not-leaked-at-all (in which case LSan will no longer > > mention it), or it does nothing (it throws some wasted memory into a > > structure which is itself not leaked). > > Yes, I think either categorically wrong here, or it applies to some > other case I wasn't able to dig up. Or maybe not, doesn't Taylor's > example take it from "Direct leak" to "Indirect leak" with the > suppression in play? I think those were related somehow (but don't have > that in front of me as I type this out). I don't think UNLEAK() can move something from "direct" to "indirect" in LSan's terminology. If rev_info points to an array of structs, and those structs point to allocated strings, then the array itself is a "direct" leak, and the strings are "indirect" (they are leaked, but presumably fixing the direct leak would also deallocate them). If UNLEAK() makes the array not-leaked, then those indirect leaks don't become direct. They should be transitively not-leaked, too. > E.g. (to reinforce your point) try compiling with SANITIZE=leak and running: > > $ TZ=UTC t/helper/test-tool date show:format:%z 1466000000 +0200 > 1466000000 -> +0000 > +0200 -> +0000 > > ================================================================= > ==335188==ERROR: LeakSanitizer: detected memory leaks > > Direct leak of 3 byte(s) in 1 object(s) allocated from: > #0 0x7f31cdd21db0 in __interceptor_malloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:54 > #1 0x7f31cdb04e4a in __GI___strdup string/strdup.c:42 > > SUMMARY: LeakSanitizer: 3 byte(s) leaked in 1 allocation(s). So these should be real leaks. Of course with the lousy stack trace it's hard to see what they are. But I don't see how UNLEAK() is responsible for making the lousy stack trace. You could try compiling with LSan but _not_ -DSUPPRESS_ANNOTATED_LEAKS and see if the result is similarly bad (but I expect it to be, since test-date.c does not have any UNLEAK() calls in it). -Peff
On Fri, Oct 29, 2021 at 04:56:31PM -0400, Jeff King wrote: > > E.g. (to reinforce your point) try compiling with SANITIZE=leak and running: > > > > $ TZ=UTC t/helper/test-tool date show:format:%z 1466000000 +0200 > > 1466000000 -> +0000 > > +0200 -> +0000 > > > > ================================================================= > > ==335188==ERROR: LeakSanitizer: detected memory leaks > > > > Direct leak of 3 byte(s) in 1 object(s) allocated from: > > #0 0x7f31cdd21db0 in __interceptor_malloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:54 > > #1 0x7f31cdb04e4a in __GI___strdup string/strdup.c:42 > > > > SUMMARY: LeakSanitizer: 3 byte(s) leaked in 1 allocation(s). > > So these should be real leaks. Of course with the lousy stack trace it's > hard to see what they are. But I don't see how UNLEAK() is responsible > for making the lousy stack trace. You could try compiling with LSan but > _not_ -DSUPPRESS_ANNOTATED_LEAKS and see if the result is similarly bad > (but I expect it to be, since test-date.c does not have any UNLEAK() > calls in it). I just tested this, and it is unrelated to UNLEAK(). Interestingly, compiling with SANITIZE=address does give the correct stack trace, so I think LSan is just buggy here for some reason. We usually suppress leaks with ASAN_OPTIONS because there are so many (and we want to get at the more important signal of actual memory errors). But it wouldn't be hard to have that setting respect your PASSING_SANITIZE_LEAK variables. The big downside is that ASan runs of the test suite are much more expensive. Hmm. A little googling turns up LSan's fast_unwind_on_malloc option. And indeed: $ LSAN_OPTIONS=fast_unwind_on_malloc=0 t/helper/test-tool date show:format:%z 1466000000 +0200 1466000000 -> +0000 +0200 -> +0000 ================================================================= ==39628==ERROR: LeakSanitizer: detected memory leaks Direct leak of 3 byte(s) in 1 object(s) allocated from: #0 0x7fa664e39b94 in __interceptor_malloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:56 #1 0x7fa664c124aa in __GI___strdup string/strdup.c:42 #2 0x557636c40c2e in xstrdup /home/peff/compile/git/wrapper.c:29 #3 0x557636b2ae52 in parse_date_format /home/peff/compile/git/date.c:991 #4 0x557636aedb0e in show_dates t/helper/test-date.c:39 #5 0x557636aedee1 in cmd__date t/helper/test-date.c:116 #6 0x557636aec1f0 in cmd_main t/helper/test-tool.c:127 #7 0x557636aec30d in main /home/peff/compile/git/common-main.c:52 #8 0x7fa664babe49 in __libc_start_main ../csu/libc-start.c:314 #9 0x557636aebec9 in _start (/home/peff/compile/git/t/helper/test-tool+0xcec9) SUMMARY: LeakSanitizer: 3 byte(s) leaked in 1 allocation(s). So perhaps we ought to be setting that by default. -Peff
diff --git a/t/t0056-git-C.sh b/t/t0056-git-C.sh index 2630e756dab..490aefa81a1 100755 --- a/t/t0056-git-C.sh +++ b/t/t0056-git-C.sh @@ -2,7 +2,11 @@ test_description='"-C <path>" option and its effects on other path-related options' +TEST_PASSES_SANITIZE_LEAK=true . ./test-lib.sh +todo_leaks <<-\EOF +^cmd_log_init_finish$ +EOF test_expect_success '"git -C <path>" runs git from the directory <path>' ' test_create_repo dir1 && diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index eef2262a360..d89bf5da7dc 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -243,6 +243,48 @@ debug () { done } +# Declare known "general" memory leaks, for use with TEST_PASSES_SANITIZE_LEAK=true. +# +# Matches lines in a stack trace that leaks. Intended for +# LSAN_OPTIONS, but the format is intended to be easy to use with +# other leak checkers, so the "leak:" prefix is omitted (and added for +# you). +# +# Use it immediately after sourcing test-lib.sh (or equivalent), and +# after a "TEST_PASSES_SANITIZE_LEAK=true" has been set. E.g: +# +# TEST_PASSES_SANITIZE_LEAK=true +# . ./test-lib.sh +# todo_leaks <<-\EOF +# ^cmd_log_init_finish$ +# EOF +# +# The "^" and "$" anchors don't suggest full regex syntax support, +# that's the only anchoring (or other metacharacter) understood by +# LSAN_OPTIONS,. +# +# See +# https://github.com/google/sanitizers/wiki/AddressSanitizerLeakSanitizer#suppressions +# for the relevant LSAN_OPTIONS documentation. +todo_leaks () { + if ! test_have_prereq SANITIZE_LEAK + then + return 0 + fi + + # Try not to interfere with any test logic + suppressions=.lsan-suppressions.txt + if test -d .git + then + suppressions=".git/$suppressions" + fi + suppressions="$PWD/$suppressions" + + sed 's/^/leak:/' >"$suppressions" && + LSAN_OPTIONS="$LSAN_OPTIONS:suppressions=\"$suppressions\"" && + export LSAN_OPTIONS +} + # Usage: test_commit [options] <message> [<file> [<contents> [<tag>]]] # -C <dir>: # Run all git commands in directory <dir>