mbox series

[00/11] midx: clean up t5319 under 'SANITIZE=leak'

Message ID cover.1634787555.git.me@ttaylorr.com (mailing list archive)
Headers show
Series midx: clean up t5319 under 'SANITIZE=leak' | expand

Message

Taylor Blau Oct. 21, 2021, 3:39 a.m. UTC
Here is a topic that Ævar got me interested when he mentioned that t5319 is
leaky[1].

This is the result of trying to get t5319 to pass when compiling with
SANITIZE=leak. About 50% of the fixes are in the MIDX code, another 40% are in
the pack-bitmap code, and the remaining 10% are sort of random.

I tried to separate these out based on their respective areas. The last 10% are
all from leaking memory in the rev_info structure, which I punted on for now by
just UNLEAK()-ing it. That's all done in the last patch. If we choose to take
that last patch, then t5319 passes even under SANITIZE=leak builds. But it's a
little gross, so I'm happy to leave it out if others would prefer.

This is based on tb/fix-midx-rename-while-mapped, with a back-merge of
js/windows-ci-path-fix in order to appease CI. It also contains a little bit of
overlap with a patch[2] from Ævar in another series, which he and I discovered
independently. I cherry-picked his patch, since I haven't seen much action on
that series lately.

[1]: https://lore.kernel.org/git/87wnmph73b.fsf@evledraar.gmail.com/
[2]: https://lore.kernel.org/git/patch-v3-08.10-e0a3510dd88-20211013T222329Z-avarab@gmail.com/

Taylor Blau (10):
  midx.c: clean up chunkfile after reading the MIDX
  midx.c: don't leak MIDX from verify_midx_file
  t/helper/test-read-midx.c: free MIDX within read_midx_file()
  builtin/pack-objects.c: don't leak memory via arguments
  builtin/repack.c: avoid leaking child arguments
  builtin/multi-pack-index.c: don't leak concatenated options
  pack-bitmap.c: avoid leaking via midx_bitmap_filename()
  pack-bitmap.c: don't leak type-level bitmaps
  pack-bitmap.c: more aggressively free in free_bitmap_index()
  t5319: UNLEAK() the remaining leaks

Ævar Arnfjörð Bjarmason (1):
  pack-bitmap-write.c: don't return without stop_progress()

 builtin/log.c               |  1 +
 builtin/multi-pack-index.c  |  4 ++++
 builtin/pack-objects.c      | 12 ++++++++----
 builtin/repack.c            | 35 ++++++++++++++++++++++++-----------
 builtin/rev-list.c          |  2 ++
 midx.c                      |  7 +++++--
 pack-bitmap-write.c         |  8 +++++---
 pack-bitmap.c               | 18 ++++++++++++++++--
 t/helper/test-read-midx.c   |  3 ++-
 t/t5319-multi-pack-index.sh |  1 +
 10 files changed, 68 insertions(+), 23 deletions(-)

Comments

Ævar Arnfjörð Bjarmason Oct. 21, 2021, 11:50 a.m. UTC | #1
On Wed, Oct 20 2021, Taylor Blau wrote:

> I tried to separate these out based on their respective areas. The last 10% are
> all from leaking memory in the rev_info structure, which I punted on for now by
> just UNLEAK()-ing it. That's all done in the last patch. If we choose to take
> that last patch, then t5319 passes even under SANITIZE=leak builds. But it's a
> little gross, so I'm happy to leave it out if others would prefer.

I'll defer to you, but I've got a mild preference for keeping it out. An
upcoming series of patches I'm submitting (I'm waiting on the current
leak fixes to land) will fix most of those rev_info leaks. So not having
UNLEAK() markings in-flight makes things a bit easier to manage.
Derrick Stolee Oct. 21, 2021, 1:37 p.m. UTC | #2
On 10/20/2021 11:39 PM, Taylor Blau wrote:
> Here is a topic that Ævar got me interested when he mentioned that t5319 is
> leaky[1].
> 
> This is the result of trying to get t5319 to pass when compiling with
> SANITIZE=leak. About 50% of the fixes are in the MIDX code, another 40% are in
> the pack-bitmap code, and the remaining 10% are sort of random.
> 
> I tried to separate these out based on their respective areas. The last 10% are
> all from leaking memory in the rev_info structure, which I punted on for now by
> just UNLEAK()-ing it. That's all done in the last patch. If we choose to take
> that last patch, then t5319 passes even under SANITIZE=leak builds. But it's a
> little gross, so I'm happy to leave it out if others would prefer.
> 
> This is based on tb/fix-midx-rename-while-mapped, with a back-merge of
> js/windows-ci-path-fix in order to appease CI. It also contains a little bit of
> overlap with a patch[2] from Ævar in another series, which he and I discovered
> independently. I cherry-picked his patch, since I haven't seen much action on
> that series lately.

In my reading, I only found one nitpick that wasn't already caught by
other reviewers. Feel free to ignore my comment as it was just something
that caused me to pause while reading, but isn't that unusual.

Thanks,
-Stolee
Taylor Blau Oct. 22, 2021, 4:39 a.m. UTC | #3
On Thu, Oct 21, 2021 at 01:50:55PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
> On Wed, Oct 20 2021, Taylor Blau wrote:
>
> > I tried to separate these out based on their respective areas. The last 10% are
> > all from leaking memory in the rev_info structure, which I punted on for now by
> > just UNLEAK()-ing it. That's all done in the last patch. If we choose to take
> > that last patch, then t5319 passes even under SANITIZE=leak builds. But it's a
> > little gross, so I'm happy to leave it out if others would prefer.
>
> I'll defer to you, but I've got a mild preference for keeping it out. An
> upcoming series of patches I'm submitting (I'm waiting on the current
> leak fixes to land) will fix most of those rev_info leaks. So not having
> UNLEAK() markings in-flight makes things a bit easier to manage.

If you don't mind, I think keeping this in (as a way to verify that we
really did make t5319 leak-free) may be good for reviewers. When you
clean these areas up, you'll have to remember to remove those UNLEAK()s,
but hopefully they produce conflicts with your in-flight work that serve
as not-too-annoying reminders.

I would certainly rather not have to UNLEAK() those at all, so I am very
excited to see a series from you which handles freeing these resources
appropriately. It was just too big a bite for me to chew off when
preparing this quick series.

Thanks,
Taylor
Ævar Arnfjörð Bjarmason Oct. 22, 2021, 8:23 a.m. UTC | #4
On Fri, Oct 22 2021, Taylor Blau wrote:

> On Thu, Oct 21, 2021 at 01:50:55PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Wed, Oct 20 2021, Taylor Blau wrote:
>>
>> > I tried to separate these out based on their respective areas. The last 10% are
>> > all from leaking memory in the rev_info structure, which I punted on for now by
>> > just UNLEAK()-ing it. That's all done in the last patch. If we choose to take
>> > that last patch, then t5319 passes even under SANITIZE=leak builds. But it's a
>> > little gross, so I'm happy to leave it out if others would prefer.
>>
>> I'll defer to you, but I've got a mild preference for keeping it out. An
>> upcoming series of patches I'm submitting (I'm waiting on the current
>> leak fixes to land) will fix most of those rev_info leaks. So not having
>> UNLEAK() markings in-flight makes things a bit easier to manage.
>
> If you don't mind, I think keeping this in (as a way to verify that we
> really did make t5319 leak-free) may be good for reviewers. When you
> clean these areas up, you'll have to remember to remove those UNLEAK()s,
> but hopefully they produce conflicts with your in-flight work that serve
> as not-too-annoying reminders.
>
> I would certainly rather not have to UNLEAK() those at all, so I am very
> excited to see a series from you which handles freeing these resources
> appropriately. It was just too big a bite for me to chew off when
> preparing this quick series.

Having looked at this a bit more carefully I've upgraded that a bit from
"I'll defer to you" to strongly objecting to this specific approach (but
read to the end, I think you can have your cake and eat it too).

I just glanced at this series in my previous pass-over, and evidently
didn't read the CL to the end. I thought based on the commit subject
that it was just a change to some midx code impacting t5319.

But it's a bigger change across the whole test suite than that.

If you run:

    rm .prove; GIT_SKIP_TESTS=t0027 prove -j8 --state=save t[0-9]*.sh :: --immediate

Without this patch (t0027 will take forever), and then:

    GIT_SKIP_TESTS=t0027 prove -j8 --state=failed

You'll see that (I use GNU screen's logging feature to get the output)
we made these tests pass, not just your t5319:
    
    $ grep -w ok screenlog.11 
    t0056-git-C.sh .................................... ok                  
    t1060-object-corruption.sh ........................ ok                  
    t4212-log-corrupt.sh .............................. ok                  
    t4207-log-decoration-colors.sh .................... ok                  
    t5313-pack-bounds-checks.sh ....................... ok                  
    t5506-remote-groups.sh ............................ ok                  
    t5513-fetch-track.sh .............................. ok                  
    t5319-multi-pack-index.sh ......................... ok                  
    t5532-fetch-proxy.sh .............................. ok                  
    t5900-repo-selection.sh ........................... ok                  
    t6011-rev-list-with-bad-commit.sh ................. ok                  
    t6100-rev-list-in-order.sh ........................ ok                  
    t6114-keep-packs.sh ............................... ok                  
    t6131-pathspec-icase.sh ........................... ok                  
    t6003-rev-list-topo-order.sh ...................... ok                  
    t7702-repack-cyclic-alternate.sh .................. ok                  
    t9108-git-svn-glob.sh ............................. ok                  
    t9109-git-svn-multi-glob.sh ....................... ok                  
    t9127-git-svn-partial-rebuild.sh .................. ok                  
    t9133-git-svn-nested-git-repo.sh .................. ok                  
    t9168-git-svn-partially-globbed-names.sh .......... ok                  

I've got subsequent patches on top of what's now in-flight that fix
those bigger leaks incrementally, and as part of that add
"TEST_PASSES_SANITIZE_LEAK=true" to /all/ the relevant passing
tests.

I.e. there isn't a 1=1 correspondance between all current passing tests
and "TEST_PASSES_SANITIZE_LEAK=true" markings (some are still left
unmarked). But there has been a 1=1 mapping between freshly pasing tests
as a result of leak fixes and tests that get that
"TEST_PASSES_SANITIZE_LEAK=true" marking.

I think it helps a lot to review those patches & assess their impact if
code changes can be paired with relevant test suite changes, which isn't
the case if we've got simultaneous UNLEAK() markings for major leaks
in-flight.

So the practical effect of that is that I'll need to rebase all that,
change my testing setup to test those one-at-at-time with "git rebase -i
--exec" before submission (which I do anyway), but now with some
selective revert of an earlier UNLEAK() to assert that I'm still fixing
the things I expected.

Then either drop the parts of the commit messages that say "this fixes
leak XYZ, making tests A, B, C pass" to omit any mention of "A, B, C",
or reword it for the earlier now-landed UNLEAK() markings.

These are also just the fully passing tests I ran as a one-off for this
E-Mail. For those I do more exhaustive runs where I'll e.g. spot if a
test goes from N failing to N-X, and note it if that gets us much closer
to 0.

But on the "cake and eat it too" front I also realize that it sucks to
not be able to mark tests you made leak-free as passing just because
I've got some larger more general leak fixes planned, and even if none
of your code leaks, it might just be "git log" or whatever.

But we can just use LSAN_OPTIONS with a suppressions file for that. This
diff below (assume my SOB yadayada) makes your tests pass if I eject the
11/11 and replace that with this:

diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index a3c72b68f7c..3f6d716f825 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -1,8 +1,23 @@
 #!/bin/sh
 
 test_description='multi-pack-indexes'
+
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
+test_expect_success SANITIZE_LEAK 'setup LSAN whitelisting' '
+	suppressions=".git/lsan-suppressions.txt" &&	    
+	cat >"$suppressions" <<-\EOF &&
+	leak:add_rev_cmdline
+	leak:cmd_log_init_finish
+	leak:prep_parse_options
+	leak:prepare_revision_walk
+	leak:strdup
+	EOF
+	LSAN_OPTIONS="$LSAN_OPTIONS:suppressions=\"$PWD/$suppressions\"" &&
+	export LSAN_OPTIONS
+'
+
 GIT_TEST_MULTI_PACK_INDEX=0
 objdir=.git/objects
 
I think that would be a much better approach here, and would really
prefer if we went for some version of that if you really want to test
these right away with the linux-leaks job.

I think the better thing to do here is to just leave it, i.e. we've got
tons of these tests that don't leak themselves, but leak due to "git
log" or whatever already, but anyway, if what you're doing in
t5319-multi-pack-index.sh doesn't cause much more work for me as noted
above I really don't mind.

If you want to pick that approach up and run with it I think it would
probably make sense to factor that suggested test_expect_success out
into a function in test-lib-functions.sh or whatever, and call it as
e.g.:
    
    TEST_PASSES_SANITIZE_LEAK=true
     . ./test-lib.sh
    declare_known_leaks <<-\EOF
    add_rev_cmdline
    [...]
    EOF

Then pipe it through sed 's/^/leak:/' and have it set LSAN_OPTIONS for
you.

Doing it that way would be less boilerplate for each test that wants it,
and is also more likely to work with other non-LSAN leak appoaches,
i.e. as long as something can take a list of lines matching stack traces
we can feed that to that leak checker's idea of a whitelist.