diff mbox series

[04/12] bloom: clear each bloom_key after use

Message ID 9ae15b94881369fa1cbd09fc2de9cc94c30edb2d.1617994052.git.gitgitgadget@gmail.com (mailing list archive)
State New
Headers show
Series Fix all leaks in tests t0002-t0099: Part 1 | expand

Commit Message

Andrzej Hunt April 9, 2021, 6:47 p.m. UTC
From: Andrzej Hunt <ajrhunt@google.com>

fill_bloom_key() allocates memory into bloom_key, we need to clean that
up once the key is no longer needed.

This fixes the following leak which was found while running t0002-t0099.
Although this leak is happening in code being called from a test-helper,
the same code is also used in various locations around git, and could
presumably happen during normal usage too.

Direct leak of 308 byte(s) in 11 object(s) allocated from:
    #0 0x49a5e2 in calloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:154:3
    #1 0x6f4032 in xcalloc wrapper.c:140:8
    #2 0x4f2905 in fill_bloom_key bloom.c:137:28
    #3 0x4f34c1 in get_or_compute_bloom_filter bloom.c:284:4
    #4 0x4cb484 in get_bloom_filter_for_commit t/helper/test-bloom.c:43:11
    #5 0x4cb072 in cmd__bloom t/helper/test-bloom.c:97:3
    #6 0x4ca7ef in cmd_main t/helper/test-tool.c:121:11
    #7 0x4caace in main common-main.c:52:11
    #8 0x7f798af95349 in __libc_start_main (/lib64/libc.so.6+0x24349)

SUMMARY: AddressSanitizer: 308 byte(s) leaked in 11 allocation(s).

Signed-off-by: Andrzej Hunt <ajrhunt@google.com>
---
 bloom.c | 1 +
 1 file changed, 1 insertion(+)

Comments

SZEDER Gábor April 11, 2021, 7:26 a.m. UTC | #1
On Fri, Apr 09, 2021 at 06:47:23PM +0000, Andrzej Hunt via GitGitGadget wrote:
> From: Andrzej Hunt <ajrhunt@google.com>
> 
> fill_bloom_key() allocates memory into bloom_key, we need to clean that
> up once the key is no longer needed.
> 
> This fixes the following leak which was found while running t0002-t0099.
> Although this leak is happening in code being called from a test-helper,
> the same code is also used in various locations around git, and could
> presumably happen during normal usage too.

It does indeed happen: 'git commit-graph write --reachable
--changed-paths' generates Bloom filters for every commit, with each
filter containing all paths modified by its associated commit, so it
leaks a lot of 7 * 4byte hashes.  This patch reduces the memory usage
of that command:

                         Max RSS
                    before      after
  ---------------------------------------------
  android-base     1275028k   1006576k   -21.1%
  chromium         3245144k   3127764k    -3.6%
  cmssw             793996k    699156k   -12.0%
  cpython           371584k    343480k    -7.6%
  elasticsearch     748104k    637936k   -14.7%
  freebsd-src       819020k    741272k    -9.5%
  gcc               867412k    730332k   -15.8%
  gecko-dev        2619112k   2457280k    -6.2%
  git               252684k    216900k   -14.2%
  glibc             239000k    222228k    -7.0%
  go                264132k    251344k    -4.9%
  homebrew-cask     542188k    480588k   -11.4%
  homebrew-core     805332k    715848k   -11.1%
  jdk               417832k    342928k   -17.9%
  libreoff-core    1257296k   1089980k   -13.3%
  linux            2033296k   1759712k   -13.5%
  llvm-project     1067216k    956704k   -10.4%
  mariadb-srv       695172k    559508k   -19.5%
  postgres          340132k    317416k    -6.7%
  rails             325432k    294332k    -9.6%
  rust              655244k    584904k   -10.7%
  tensorflow        507308k    480848k    -5.2%
  webkit           2466812k   2237332k    -9.3%

Just out of curiosity, I disabled the questionable hardcoded 512 paths
limit on the size of modified path Bloom filters, and the memory usage
in the jdk repository sunk by over 55%, from 849520k to 379760k.

Please feel free to include any of the above data points in the commit
message.

> Direct leak of 308 byte(s) in 11 object(s) allocated from:
>     #0 0x49a5e2 in calloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:154:3
>     #1 0x6f4032 in xcalloc wrapper.c:140:8
>     #2 0x4f2905 in fill_bloom_key bloom.c:137:28
>     #3 0x4f34c1 in get_or_compute_bloom_filter bloom.c:284:4
>     #4 0x4cb484 in get_bloom_filter_for_commit t/helper/test-bloom.c:43:11
>     #5 0x4cb072 in cmd__bloom t/helper/test-bloom.c:97:3
>     #6 0x4ca7ef in cmd_main t/helper/test-tool.c:121:11
>     #7 0x4caace in main common-main.c:52:11
>     #8 0x7f798af95349 in __libc_start_main (/lib64/libc.so.6+0x24349)
> 
> SUMMARY: AddressSanitizer: 308 byte(s) leaked in 11 allocation(s).
> 
> Signed-off-by: Andrzej Hunt <ajrhunt@google.com>
> ---
>  bloom.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/bloom.c b/bloom.c
> index 52b87474c6eb..5e297038bb1f 100644
> --- a/bloom.c
> +++ b/bloom.c
> @@ -283,6 +283,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
>  			struct bloom_key key;
>  			fill_bloom_key(e->path, strlen(e->path), &key, settings);
>  			add_key_to_filter(&key, filter, settings);
> +			clear_bloom_key(&key);
>  		}
>  
>  	cleanup:
> -- 
> gitgitgadget
>
Andrzej Hunt April 25, 2021, 1:17 p.m. UTC | #2
On 11/04/2021 09:26, SZEDER Gábor wrote:
> On Fri, Apr 09, 2021 at 06:47:23PM +0000, Andrzej Hunt via GitGitGadget wrote:
>> From: Andrzej Hunt <ajrhunt@google.com>
>>
>> fill_bloom_key() allocates memory into bloom_key, we need to clean that
>> up once the key is no longer needed.
>>
>> This fixes the following leak which was found while running t0002-t0099.
>> Although this leak is happening in code being called from a test-helper,
>> the same code is also used in various locations around git, and could
>> presumably happen during normal usage too.
> 
> It does indeed happen: 'git commit-graph write --reachable
> --changed-paths' generates Bloom filters for every commit, with each
> filter containing all paths modified by its associated commit, so it
> leaks a lot of 7 * 4byte hashes.  This patch reduces the memory usage
> of that command:
> 
>                           Max RSS
>                      before      after
>    ---------------------------------------------
>    android-base     1275028k   1006576k   -21.1%
>    chromium         3245144k   3127764k    -3.6%
>    cmssw             793996k    699156k   -12.0%
>    cpython           371584k    343480k    -7.6%
>    elasticsearch     748104k    637936k   -14.7%
>    freebsd-src       819020k    741272k    -9.5%
>    gcc               867412k    730332k   -15.8%
>    gecko-dev        2619112k   2457280k    -6.2%
>    git               252684k    216900k   -14.2%
>    glibc             239000k    222228k    -7.0%
>    go                264132k    251344k    -4.9%
>    homebrew-cask     542188k    480588k   -11.4%
>    homebrew-core     805332k    715848k   -11.1%
>    jdk               417832k    342928k   -17.9%
>    libreoff-core    1257296k   1089980k   -13.3%
>    linux            2033296k   1759712k   -13.5%
>    llvm-project     1067216k    956704k   -10.4%
>    mariadb-srv       695172k    559508k   -19.5%
>    postgres          340132k    317416k    -6.7%
>    rails             325432k    294332k    -9.6%
>    rust              655244k    584904k   -10.7%
>    tensorflow        507308k    480848k    -5.2%
>    webkit           2466812k   2237332k    -9.3%
> 
> Just out of curiosity, I disabled the questionable hardcoded 512 paths
> limit on the size of modified path Bloom filters, and the memory usage
> in the jdk repository sunk by over 55%, from 849520k to 379760k.
> 
> Please feel free to include any of the above data points in the commit
> message.

Thank you for the detailed analysis - these kinds of results are very 
motivating! I will include a brief summary (something like "10% typical 
improvement for 'commit-graph write' for large repos") along with a link 
to your posting for those who want the full picture.
diff mbox series

Patch

diff --git a/bloom.c b/bloom.c
index 52b87474c6eb..5e297038bb1f 100644
--- a/bloom.c
+++ b/bloom.c
@@ -283,6 +283,7 @@  struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
 			struct bloom_key key;
 			fill_bloom_key(e->path, strlen(e->path), &key, settings);
 			add_key_to_filter(&key, filter, settings);
+			clear_bloom_key(&key);
 		}
 
 	cleanup: