mbox series

[v3,0/8] add a test mode for SANITIZE=leak, run it in CI

Message ID cover-v3-0.8-00000000000-20210831T132546Z-avarab@gmail.com (mailing list archive)
Headers show
Series add a test mode for SANITIZE=leak, run it in CI | expand

Message

Ævar Arnfjörð Bjarmason Aug. 31, 2021, 1:35 p.m. UTC
We can compile git with SANITIZE=leak, and have had various efforts in
the past such as 31f9acf9ce2 (Merge branch 'ah/plugleaks', 2021-08-04)
to plug memory leaks, but have had no CI testing of it to ensure that
we don't get regressions. This series adds a GIT_TEST_* mode for
checking those regressions, and runs it in CI.

Since I submitted v2 the delta between origin/master..origin/seen
broke even t0001-init.sh when run under SANITIZE=leak, so this series
will cause test smoke on "seen". It's a prblom with another topic[1]
though, which I and Emily will fix.

Changes since v2:

 * In v2 compiling with SANITIZE=leak would change things so only
   known-good passing tests were run by default, everything else would
   pass as a dummy. Now the default running of tests is unchanged, but
   if we run with GIT_TEST_PASSING_SANITIZE_LEAK=true only those tests
   are run which set and export TEST_PASSES_SANITIZE_LEAK=true.

 * The facility for declaring known-good tests in test-lib.sh based on
   wildcards is gone, instead individual tests need to declare if
   they're OK under SANITIZE=leak. This is done via "export
   TEST_PASSES_SANITIZE_LEAK=true", there's a handy import of
   "./test-pragma-SANITIZE=leak-ok.sh" before sourcing "./test-lib.sh"
   itself to set this.

 * The various leak fixes are gone entirely. I'll submit some of those
   independently or as follow-ups.

 * We now mark 57 tests in the test suite as OK under
   SANITIZE=leak. This is far from all of them, but gives us a decent
   set to start out with. The largest chunk of these is in t0*.sh.

 * The CI job is not run under both GCC and Clang, but just whatever
   with one default compiler (which happens to be GCC). I'd missed
   that running under both was pointless.

   It would be meaningful to run this under e.g. OSX & Windows too, as
   we take different codepaths there, but that can be left to a
   follow-up series.

1. https://lore.kernel.org/git/8735qvyw0p.fsf@evledraar.gmail.com/

Ævar Arnfjörð Bjarmason (8):
  Makefile: add SANITIZE=leak flag to GIT-BUILD-OPTIONS
  CI: refactor "if" to "case" statement
  tests: add a test mode for SANITIZE=leak, run it in CI
  tests: annotate t000*.sh with TEST_PASSES_SANITIZE_LEAK=true
  tests: annotate t001*.sh with TEST_PASSES_SANITIZE_LEAK=true
  tests: annotate t002*.sh with TEST_PASSES_SANITIZE_LEAK=true
  tests: annotate select t0*.sh with TEST_PASSES_SANITIZE_LEAK=true
  tests: annotate select t*.sh with TEST_PASSES_SANITIZE_LEAK=true

 .github/workflows/main.yml              |  2 ++
 Makefile                                |  5 +++++
 ci/install-dependencies.sh              |  4 ++--
 ci/lib.sh                               | 29 +++++++++++++++++--------
 ci/run-build-and-tests.sh               |  4 ++--
 t/README                                |  7 ++++++
 t/t0000-basic.sh                        |  1 +
 t/t0001-init.sh                         |  1 +
 t/t0002-gitfile.sh                      |  1 +
 t/t0003-attributes.sh                   |  1 +
 t/t0004-unwritable.sh                   |  3 ++-
 t/t0005-signals.sh                      |  2 ++
 t/t0007-git-var.sh                      |  2 ++
 t/t0008-ignores.sh                      |  1 +
 t/t0010-racy-git.sh                     |  1 +
 t/t0011-hashmap.sh                      |  1 +
 t/t0013-sha1dc.sh                       |  1 +
 t/t0016-oidmap.sh                       |  1 +
 t/t0017-env-helper.sh                   |  1 +
 t/t0018-advice.sh                       |  1 +
 t/t0022-crlf-rename.sh                  |  1 +
 t/t0024-crlf-archive.sh                 |  1 +
 t/t0025-crlf-renormalize.sh             |  1 +
 t/t0026-eol-config.sh                   |  1 +
 t/t0029-core-unsetenvvars.sh            |  1 +
 t/t0030-stripspace.sh                   |  1 +
 t/t0052-simple-ipc.sh                   |  1 +
 t/t0061-run-command.sh                  |  1 +
 t/t0063-string-list.sh                  |  1 +
 t/t0066-dir-iterator.sh                 |  1 +
 t/t0067-parse_pathspec_file.sh          |  1 +
 t/t0091-bugreport.sh                    |  1 +
 t/t1010-mktree.sh                       |  1 +
 t/t1100-commit-tree-options.sh          |  1 +
 t/t1308-config-set.sh                   |  1 +
 t/t1309-early-config.sh                 |  1 +
 t/t1420-lost-found.sh                   |  1 +
 t/t1430-bad-ref-name.sh                 |  1 +
 t/t1509-root-work-tree.sh               |  1 +
 t/t2002-checkout-cache-u.sh             |  1 +
 t/t2050-git-dir-relative.sh             |  1 +
 t/t2081-parallel-checkout-collisions.sh |  1 +
 t/t2100-update-cache-badpath.sh         |  1 +
 t/t2200-add-update.sh                   |  1 +
 t/t2201-add-update-typechange.sh        |  1 +
 t/t2202-add-addremove.sh                |  1 +
 t/t2204-add-ignored.sh                  |  1 +
 t/t2300-cd-to-toplevel.sh               |  1 +
 t/t3000-ls-files-others.sh              |  1 +
 t/t3004-ls-files-basic.sh               |  1 +
 t/t3006-ls-files-long.sh                |  1 +
 t/t3008-ls-files-lazy-init-name-hash.sh |  1 +
 t/t3100-ls-tree-restrict.sh             |  1 +
 t/t3101-ls-tree-dirname.sh              |  1 +
 t/t3102-ls-tree-wildcards.sh            |  1 +
 t/t3103-ls-tree-misc.sh                 |  1 +
 t/t3205-branch-color.sh                 |  1 +
 t/t3211-peel-ref.sh                     |  1 +
 t/t3300-funny-names.sh                  |  1 +
 t/t3902-quoted.sh                       |  1 +
 t/t4002-diff-basic.sh                   |  1 +
 t/t4026-color.sh                        |  1 +
 t/t4300-merge-tree.sh                   |  1 +
 t/test-lib.sh                           | 22 +++++++++++++++++++
 t/test-pragma-SANITIZE=leak-ok.sh       |  8 +++++++
 65 files changed, 128 insertions(+), 14 deletions(-)
 create mode 100644 t/test-pragma-SANITIZE=leak-ok.sh

Range-diff against v2:
-:  ----------- > 1:  85619728d41 Makefile: add SANITIZE=leak flag to GIT-BUILD-OPTIONS
-:  ----------- > 2:  91c36b94eaa CI: refactor "if" to "case" statement
1:  df5a44e70b5 ! 3:  7e3577e4e3c tests: add a test mode for SANITIZE=leak, run it in CI
    @@ Commit message
         corresponding GIT_TEST_* mode for it, i.e. memory leaks have been
         fixed as one-offs without structured regression testing.
     
    -    This change add such a mode, we now have new
    -    linux-{clang,gcc}-sanitize-leak CI targets, these targets run the same
    -    tests as linux-{clang,gcc}, except that almost all of them are
    -    skipped.
    +    This change add such a mode, and a new linux-SANITIZE=leak CI
    +    target. The test mode and CI target only runs a whitelist of
    +    known-good tests using a mechanism discussed below, to ensure that we
    +    won't add regressions to code that's had its memory leaks fixed.
     
    -    There is a whitelist of some tests that are OK in test-lib.sh, and
    -    individual tests can be opted-in by setting
    -    GIT_TEST_SANITIZE_LEAK=true before sourcing test-lib.sh. Within those
    -    individual test can be skipped with the "!SANITIZE_LEAK"
    -    prerequisite. See the updated t/README for more details.
    +    The CI target uses a new GIT_TEST_PASSING_SANITIZE_LEAK=true test
    +    mode. When running in that mode all tests except those that have opted
    +    themselves in to running by setting and exporting
    +    TEST_PASSES_SANITIZE_LEAK=true before sourcing test-lib.sh.
     
    -    I'm using the GIT_TEST_SANITIZE_LEAK=true and !SANITIZE_LEAK pattern
    -    in a couple of tests whose memory leaks I'll fix in subsequent
    -    commits.
    +    I'm adding a "test-pragma-SANITIZE=leak-ok.sh" wrapper for setting and
    +    exporting that variable, as the assignment/export boilerplate would
    +    otherwise get quite verbose and repetitive in subsequent commits.
     
    -    I'm not being aggressive about opting in tests, it's not all tests
    -    that currently pass under SANITIZE=leak, just a small number of
    -    known-good tests. We can add more later as we fix leaks and grow more
    -    confident in this test mode.
    +    The tests using the "test-pragma-SANITIZE=leak-ok.sh" pragma can in
    +    turn make use of the "SANITIZE_LEAK" prerequisite added in a preceding
    +    commit, should they wish to selectively skip tests even under
    +    "GIT_TEST_PASSING_SANITIZE_LEAK=true".
     
    -    See the recent discussion at [1] about the lack of this sort of test
    -    mode, and 0e5bba53af (add UNLEAK annotation for reducing leak false
    -    positives, 2017-09-08) for the initial addition of SANITIZE=leak.
    +    Now tests that don't set the "test-pragma-SANITIZE=leak-ok.sh" pragma
    +    will be skipped under GIT_TEST_PASSING_SANITIZE_LEAK=true:
    +
    +        $ GIT_TEST_PASSING_SANITIZE_LEAK=true ./t0001-init.sh
    +        1..0 # SKIP skip all tests in t0001 under SANITIZE=leak, TEST_PASSES_SANITIZE_LEAK not set
    +
    +    In subsequents commit we'll conservatively add more
    +    TEST_PASSES_SANITIZE_LEAK=true annotations. The idea is that as memory
    +    leaks are fixed we can add more known-good tests to this CI target, to
    +    ensure that we won't have regressions.
    +
    +    As of writing this we've got major regressions between master..seen,
    +    i.e. the t000*.sh tests and more fixed since 31f9acf9ce2 (Merge branch
    +    'ah/plugleaks', 2021-08-04) have regressed recently.
    +
    +    See the discussion at <87czsv2idy.fsf@evledraar.gmail.com> about the
    +    lack of this sort of test mode, and 0e5bba53af (add UNLEAK annotation
    +    for reducing leak false positives, 2017-09-08) for the initial
    +    addition of SANITIZE=leak.
     
         See also 09595ab381 (Merge branch 'jk/leak-checkers', 2017-09-19),
         7782066f67 (Merge branch 'jk/apache-lsan', 2019-05-19) and the recent
         936e58851a (Merge branch 'ah/plugleaks', 2021-05-07) for some of the
         past history of "one-off" SANITIZE=leak (and more) fixes.
     
    -    When calling maybe_skip_all_sanitize_leak matching against
    -    "$TEST_NAME" instead of "$this_test" as other "match_pattern_list()"
    -    users do is intentional. I'd like to match things like "t13*config*"
    -    in subsequent commits. This part of the API isn't public, so we can
    -    freely change it in the future.
    -
    -    1. https://lore.kernel.org/git/87czsv2idy.fsf@evledraar.gmail.com/
    -
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## .github/workflows/main.yml ##
    @@ .github/workflows/main.yml: jobs:
                - jobname: linux-gcc-default
                  cc: gcc
                  pool: ubuntu-latest
    -+          - jobname: linux-clang-sanitize-leak
    -+            cc: clang
    -+            pool: ubuntu-latest
    -+          - jobname: linux-gcc-sanitize-leak
    -+            cc: gcc
    ++          - jobname: linux-SANITIZE=leak
     +            pool: ubuntu-latest
          env:
            CC: ${{matrix.vector.cc}}
            jobname: ${{matrix.vector.jobname}}
     
    - ## Makefile ##
    -@@ Makefile: PTHREAD_CFLAGS =
    - SPARSE_FLAGS ?=
    - SP_EXTRA_FLAGS = -Wno-universal-initializer
    - 
    -+# For informing GIT-BUILD-OPTIONS of the SANITIZE=leak target
    -+SANITIZE_LEAK =
    -+
    - # For the 'coccicheck' target; setting SPATCH_BATCH_SIZE higher will
    - # usually result in less CPU usage at the cost of higher peak memory.
    - # Setting it to 0 will feed all files in a single spatch invocation.
    -@@ Makefile: BASIC_CFLAGS += -DSHA1DC_FORCE_ALIGNED_ACCESS
    - endif
    - ifneq ($(filter leak,$(SANITIZERS)),)
    - BASIC_CFLAGS += -DSUPPRESS_ANNOTATED_LEAKS
    -+SANITIZE_LEAK = YesCompiledWithIt
    - endif
    - ifneq ($(filter address,$(SANITIZERS)),)
    - NO_REGEX = NeededForASAN
    -@@ Makefile: GIT-BUILD-OPTIONS: FORCE
    - 	@echo NO_UNIX_SOCKETS=\''$(subst ','\'',$(subst ','\'',$(NO_UNIX_SOCKETS)))'\' >>$@+
    - 	@echo PAGER_ENV=\''$(subst ','\'',$(subst ','\'',$(PAGER_ENV)))'\' >>$@+
    - 	@echo DC_SHA1=\''$(subst ','\'',$(subst ','\'',$(DC_SHA1)))'\' >>$@+
    -+	@echo SANITIZE_LEAK=\''$(subst ','\'',$(subst ','\'',$(SANITIZE_LEAK)))'\' >>$@+
    - 	@echo X=\'$(X)\' >>$@+
    - ifdef TEST_OUTPUT_DIRECTORY
    - 	@echo TEST_OUTPUT_DIRECTORY=\''$(subst ','\'',$(subst ','\'',$(TEST_OUTPUT_DIRECTORY)))'\' >>$@+
    -
      ## ci/install-dependencies.sh ##
     @@ ci/install-dependencies.sh: UBUNTU_COMMON_PKGS="make libssl-dev libcurl4-openssl-dev libexpat-dev
       libemail-valid-perl libio-socket-ssl-perl libnet-smtp-ssl-perl"
      
      case "$jobname" in
     -linux-clang|linux-gcc)
    -+linux-clang|linux-gcc|linux-clang-sanitize-leak|linux-gcc-sanitize-leak)
    ++linux-clang|linux-gcc|linux-SANITIZE=leak)
      	sudo apt-add-repository -y "ppa:ubuntu-toolchain-r/test"
      	sudo apt-get -q update
      	sudo apt-get -q -y install language-pack-is libsvn-perl apache2 \
      		$UBUNTU_COMMON_PKGS
      	case "$jobname" in
     -	linux-gcc)
    -+	linux-gcc|linux-gcc-sanitize-leak)
    ++	linux-gcc|linux-SANITIZE=leak)
      		sudo apt-get -q -y install gcc-8
      		;;
      	esac
    @@ ci/lib.sh: export GIT_TEST_CLONE_2GB=true
      
      case "$jobname" in
     -linux-clang|linux-gcc)
    --	if [ "$jobname" = linux-gcc ]
    --	then
    -+linux-clang|linux-gcc|linux-clang-sanitize-leak|linux-gcc-sanitize-leak)
    -+	case "$jobname" in
    -+	linux-gcc|linux-gcc-sanitize-leak)
    ++linux-clang|linux-gcc|linux-SANITIZE=leak)
    + 	case "$jobname" in
    +-	linux-gcc)
    ++	linux-gcc|linux-SANITIZE=leak)
      		export CC=gcc-8
      		MAKEFLAGS="$MAKEFLAGS PYTHON_PATH=/usr/bin/python3"
    --	else
    -+		;;
    -+	*)
    - 		MAKEFLAGS="$MAKEFLAGS PYTHON_PATH=/usr/bin/python2"
    --	fi
    -+		;;
    -+	esac
    - 
    - 	export GIT_TEST_HTTPD=true
    - 
    + 		;;
     @@ ci/lib.sh: linux-musl)
      	;;
      esac
      
     +case "$jobname" in
    -+linux-clang-sanitize-leak|linux-gcc-sanitize-leak)
    ++linux-SANITIZE=leak)
     +	export SANITIZE=leak
    ++	export GIT_TEST_PASSING_SANITIZE_LEAK=true
     +	;;
     +esac
     +
    @@ ci/run-build-and-tests.sh: esac
      make
      case "$jobname" in
     -linux-gcc)
    -+linux-gcc|linux-gcc-sanitize-leak)
    ++linux-gcc|linux-SANITIZE=leak)
      	export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
      	make test
      	export GIT_TEST_SPLIT_INDEX=yes
    @@ ci/run-build-and-tests.sh: linux-gcc)
      	make test
      	;;
     -linux-clang)
    -+linux-clang|linux-clang-sanitize-leak)
    ++linux-clang|linux-SANITIZE=leak)
      	export GIT_TEST_DEFAULT_HASH=sha1
      	make test
      	export GIT_TEST_DEFAULT_HASH=sha256
    @@ t/README: GIT_TEST_CHECKOUT_WORKERS=<n> overrides the 'checkout.workers' setting
      to <n> and 'checkout.thresholdForParallelism' to 0, forcing the
      execution of the parallel-checkout code.
      
    -+GIT_TEST_SANITIZE_LEAK=<boolean> will force the tests to run when git
    -+is compiled with SANITIZE=leak (we pick it up via
    -+../GIT-BUILD-OPTIONS).
    -+
    -+By default all tests are skipped when compiled with SANITIZE=leak, and
    -+individual test scripts opt themselves in to leak testing by setting
    -+GIT_TEST_SANITIZE_LEAK=true before sourcing test-lib.sh. Within those
    -+tests use the SANITIZE_LEAK prerequisite to skip individiual tests
    -+(i.e. test_expect_success !SANITIZE_LEAK [...]).
    -+
    -+So the GIT_TEST_SANITIZE_LEAK setting is different in behavior from
    -+both other GIT_TEST_*=[true|false] settings, but more useful given how
    -+SANITIZE=leak works & the state of the test suite. Manually setting
    -+GIT_TEST_SANITIZE_LEAK=true is only useful during development when
    -+finding and fixing memory leaks.
    ++GIT_TEST_PASSING_SANITIZE_LEAK=<boolean> when compiled with
    ++SANITIZE=leak will run only those tests that have whitelisted
    ++themselves as passing with no memory leaks. Do this by sourcing
    ++"test-pragma-SANITIZE=leak-ok.sh" before sourcing "test-lib.sh" itself
    ++at the top of the test script. This test mode is used by the
    ++"linux-SANITIZE=leak" CI target.
     +
      Naming Tests
      ------------
      
     
    - ## t/t5701-git-serve.sh ##
    -@@ t/t5701-git-serve.sh: test_expect_success 'unexpected lines are not allowed in fetch request' '
    + ## t/t0000-basic.sh ##
    +@@ t/t0000-basic.sh: swapping compression and hashing order, the person who is making the
    + modification *should* take notice and update the test vectors here.
    + '
    + 
    ++. ./test-pragma-SANITIZE=leak-ok.sh
    + . ./test-lib.sh
      
    - # Test the basics of object-info
    - #
    --test_expect_success 'basics of object-info' '
    -+test_expect_success !SANITIZE_LEAK 'basics of object-info' '
    - 	test-tool pkt-line pack >in <<-EOF &&
    - 	command=object-info
    - 	object-format=$(test_oid algo)
    + try_local_xy () {
     
      ## t/test-lib.sh ##
    -@@ t/test-lib.sh: then
    - 	exit 1
    - fi
    - 
    -+# SANITIZE=leak test mode
    -+sanitize_leak_true=
    -+add_sanitize_leak_true () {
    -+	sanitize_leak_true="$sanitize_leak_true$1 "
    -+}
    -+
    -+sanitize_leak_false=
    -+add_sanitize_leak_false () {
    -+	sanitize_leak_false="$sanitize_leak_false$1 "
    -+}
    -+
    -+sanitize_leak_opt_in_msg="opt-in with GIT_TEST_SANITIZE_LEAK=true"
    -+maybe_skip_all_sanitize_leak () {
    -+	# Whitelist patterns
    -+	add_sanitize_leak_true 't000*'
    -+	add_sanitize_leak_true 't001*'
    -+	add_sanitize_leak_true 't006*'
    -+
    -+	# Blacklist patterns (overrides whitelist)
    -+	add_sanitize_leak_false 't000[469]*'
    -+	add_sanitize_leak_false 't001[2459]*'
    -+	add_sanitize_leak_false 't006[0248]*'
    -+
    -+	if match_pattern_list "$1" "$sanitize_leak_false"
    -+	then
    -+		skip_all="test $this_test on SANITIZE=leak blacklist, $sanitize_leak_opt_in_msg"
    -+		test_done
    -+	elif match_pattern_list "$1" "$sanitize_leak_true"
    -+	then
    -+		return 0
    -+	fi
    -+	return 1
    -+}
    -+
    - # Are we running this test at all?
    - remove_trash=
    - this_test=${0##*/}
     @@ t/test-lib.sh: then
      	test_done
      fi
    @@ t/test-lib.sh: then
     +# SANITIZE=leak
     +if test -n "$SANITIZE_LEAK"
     +then
    -+	if test -z "$GIT_TEST_SANITIZE_LEAK" &&
    -+		maybe_skip_all_sanitize_leak "$TEST_NAME"
    ++	if test_bool_env GIT_TEST_PASSING_SANITIZE_LEAK false
     +	then
    -+		say_color info >&3 "test $this_test on SANITIZE=leak whitelist"
    -+		GIT_TEST_SANITIZE_LEAK=true
    -+	fi
    ++		# We need to see it in "git env--helper" (via
    ++		# test_bool_env)
    ++		export TEST_PASSES_SANITIZE_LEAK
     +
    -+	# We need to see it in "git env--helper" (via
    -+	# test_bool_env)
    -+	export GIT_TEST_SANITIZE_LEAK
    -+
    -+	if ! test_bool_env GIT_TEST_SANITIZE_LEAK false
    -+	then
    -+		skip_all="skip all tests in $this_test under SANITIZE=leak, $sanitize_leak_opt_in_msg"
    -+		test_done
    ++		if ! test_bool_env TEST_PASSES_SANITIZE_LEAK false
    ++		then
    ++			skip_all="skipping $this_test under GIT_TEST_PASSING_SANITIZE_LEAK=true"
    ++			test_done
    ++		fi
     +	fi
    -+elif test_bool_env GIT_TEST_SANITIZE_LEAK false
    ++elif test_bool_env GIT_TEST_PASSING_SANITIZE_LEAK false
     +then
    -+	error "GIT_TEST_SANITIZE_LEAK=true has no effect except when compiled with SANITIZE=leak"
    ++	error "GIT_TEST_PASSING_SANITIZE_LEAK=true has no effect except when compiled with SANITIZE=leak"
     +fi
     +
      # Last-minute variable setup
      HOME="$TRASH_DIRECTORY"
      GNUPGHOME="$HOME/gnupg-home-not-used"
    -@@ t/test-lib.sh: test -z "$NO_PYTHON" && test_set_prereq PYTHON
    - test -n "$USE_LIBPCRE2" && test_set_prereq PCRE
    - test -n "$USE_LIBPCRE2" && test_set_prereq LIBPCRE2
    - test -z "$NO_GETTEXT" && test_set_prereq GETTEXT
    -+test -n "$SANITIZE_LEAK" && test_set_prereq SANITIZE_LEAK
    - 
    - if test -z "$GIT_TEST_CHECK_CACHE_TREE"
    - then
    +
    + ## t/test-pragma-SANITIZE=leak-ok.sh (new) ##
    +@@
    ++#!/bin/sh
    ++
    ++## This "pragma" (as in "perldoc perlpragma") declares that the test
    ++## will pass under GIT_TEST_PASSING_SANITIZE_LEAK=true. Source this
    ++## before sourcing test-lib.sh
    ++
    ++TEST_PASSES_SANITIZE_LEAK=true
    ++export TEST_PASSES_SANITIZE_LEAK
2:  51fd1c400da < -:  ----------- SANITIZE tests: fix memory leaks in t13*config*, add to whitelist
3:  720852eee0b < -:  ----------- SANITIZE tests: fix memory leaks in t5701*, add to whitelist
4:  80edda308c9 < -:  ----------- SANITIZE tests: fix leak in mailmap.c
-:  ----------- > 4:  0cd14d64165 tests: annotate t000*.sh with TEST_PASSES_SANITIZE_LEAK=true
-:  ----------- > 5:  ed5f5705755 tests: annotate t001*.sh with TEST_PASSES_SANITIZE_LEAK=true
-:  ----------- > 6:  2599016c4e7 tests: annotate t002*.sh with TEST_PASSES_SANITIZE_LEAK=true
-:  ----------- > 7:  ddc4d6d2cf1 tests: annotate select t0*.sh with TEST_PASSES_SANITIZE_LEAK=true
-:  ----------- > 8:  e611d2c23d9 tests: annotate select t*.sh with TEST_PASSES_SANITIZE_LEAK=true

Comments

Jeff King Sept. 1, 2021, 9:56 a.m. UTC | #1
On Tue, Aug 31, 2021 at 03:35:34PM +0200, Ævar Arnfjörð Bjarmason wrote:

>  * In v2 compiling with SANITIZE=leak would change things so only
>    known-good passing tests were run by default, everything else would
>    pass as a dummy. Now the default running of tests is unchanged, but
>    if we run with GIT_TEST_PASSING_SANITIZE_LEAK=true only those tests
>    are run which set and export TEST_PASSES_SANITIZE_LEAK=true.
> 
>  * The facility for declaring known-good tests in test-lib.sh based on
>    wildcards is gone, instead individual tests need to declare if
>    they're OK under SANITIZE=leak.[...]

Hmm. This still seems more complicated than we need. If we just want a
flag in each script, then test-lib.sh can use that flag to tweak
LSAN_OPTIONS. See the patch below.

That has two drawbacks:

  - it doesn't have any way to switch the flag per-test. But IMHO it is
    a mistake to go in that direction. This is all temporary scaffolding
    while we have leaks, and the script-level of granularity is fine.

  - it runs the tests not marked as LSAN-OK, just without leak checking,
    which is redundant in CI where we're already running them. But we
    could still be collecting leak stats (and just not failing the
    tests). See the patch below.

    If we do care about not running them, then I think it makes more
    sense to extend the run/skip mechanisms and build on that.

    (I also think I prefer the central list of "mark these scripts as OK
    for leak-checking", rather than annotating individuals. Because
    again, this is temporary, and it's nice to keep it in a sandbox that
    only people working on leak-checking would look at or touch).

I realize this is kind-of bikeshedding, and I'm not vehemently opposed
to what you have here. It just seems like fewer moving parts would be
less likely to confuse folks who want to poke at it.

>    This is done via "export
>    TEST_PASSES_SANITIZE_LEAK=true", there's a handy import of
>    "./test-pragma-SANITIZE=leak-ok.sh" before sourcing "./test-lib.sh"
>    itself to set this.

I found the extra level of indirection added by this pragma confusing.
We just need to set a variable, which is also a one-liner, and one that
is more obvious about what it's doing. In your code you also export it,
but that's not necessary for something that test-lib.sh is going to look
at. Or if it's really necessary at some point, then test-lib.sh can do
the export itself.

> Ævar Arnfjörð Bjarmason (8):
>   Makefile: add SANITIZE=leak flag to GIT-BUILD-OPTIONS
>   CI: refactor "if" to "case" statement
>   tests: add a test mode for SANITIZE=leak, run it in CI
>   tests: annotate t000*.sh with TEST_PASSES_SANITIZE_LEAK=true
>   tests: annotate t001*.sh with TEST_PASSES_SANITIZE_LEAK=true
>   tests: annotate t002*.sh with TEST_PASSES_SANITIZE_LEAK=true
>   tests: annotate select t0*.sh with TEST_PASSES_SANITIZE_LEAK=true
>   tests: annotate select t*.sh with TEST_PASSES_SANITIZE_LEAK=true

Sort of a meta-question, but what's the plan for folks who add a new
test to say t0000, and it reveals a leak in code they didn't touch?

They'll get a CI failure (as will Junio if he picks up the patch), so
somebody is going to have to deal with it. Do they fix it? Do they unset
the "this script is OK" flag? Do they mark the individual test as
non-lsan-ok?

I do like the idea of finding real regressions. But while the state of
leak-checking is still so immature, I'm worried about this adding extra
friction for developers. Especially if they get some spooky action at a
distance caused by a leak in far-away code.

Anyway, here's LSAN_OPTIONS thing I was thinking of.

---
diff --git a/t/t0001-init.sh b/t/t0001-init.sh
index df544bb321..b1da18955d 100755
--- a/t/t0001-init.sh
+++ b/t/t0001-init.sh
@@ -2,6 +2,7 @@
 
 test_description='git init'
 
+TEST_LSAN_OK=1
 . ./test-lib.sh
 
 check_config () {
diff --git a/t/test-lib.sh b/t/test-lib.sh
index abcfbed6d6..62627afeaf 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -44,9 +44,30 @@ GIT_BUILD_DIR="$TEST_DIRECTORY"/..
 : ${ASAN_OPTIONS=detect_leaks=0:abort_on_error=1}
 export ASAN_OPTIONS
 
-# If LSAN is in effect we _do_ want leak checking, but we still
-# want to abort so that we notice the problems.
-: ${LSAN_OPTIONS=abort_on_error=1}
+if test -n "$LSAN_OPTIONS"
+then
+	# Leave user-provided options alone.
+	:
+elif test -n "$TEST_LSAN_OK"
+then
+	# The test script has declared itself as LSAN-clean; turn on full leak
+	# checking.
+	LSAN_OPTIONS=abort_on_error=1
+else
+	# The test script has possible LSAN failures. Just disable
+	# leak-checking entirely. Another option would be to log the failures
+	# with:
+	#
+	#   LSAN_OPTIONS=exitcode=0:log_path=$TEST_DIRECTORY/lsan/out
+	#
+	# The results are rather confusing, though, as the logs are
+	# per-process; you have no idea which one came from which test script.
+	# Ideally we'd send them to descriptor 4 along with the rest of the
+	# script log, but there's no LSAN_OPTION for that (recent versions of
+	# libsanitizer do have a public function to do so, so we could hook it
+	# ourselves via common-main).
+	LSAN_OPTIONS=detect_leaks=0
+fi
 export LSAN_OPTIONS
 
 if test ! -f "$GIT_BUILD_DIR"/GIT-BUILD-OPTIONS
Jeff King Sept. 1, 2021, 10:42 a.m. UTC | #2
On Wed, Sep 01, 2021 at 05:56:31AM -0400, Jeff King wrote:

> +else
> +	# The test script has possible LSAN failures. Just disable
> +	# leak-checking entirely. Another option would be to log the failures
> +	# with:
> +	#
> +	#   LSAN_OPTIONS=exitcode=0:log_path=$TEST_DIRECTORY/lsan/out
> +	#
> +	# The results are rather confusing, though, as the logs are
> +	# per-process; you have no idea which one came from which test script.
> +	# Ideally we'd send them to descriptor 4 along with the rest of the
> +	# script log, but there's no LSAN_OPTION for that (recent versions of
> +	# libsanitizer do have a public function to do so, so we could hook it
> +	# ourselves via common-main).
> +	LSAN_OPTIONS=detect_leaks=0
> +fi

I was curious about the fd thing. The patch below implements it, and
lets t0203 (for example) pass while including its sanitizer output in a
--verbose-log.

But this is exactly the kind of gross complexity I was suggesting to
avoid. :)

diff --git a/Makefile b/Makefile
index d1feab008f..ba2174fb79 100644
--- a/Makefile
+++ b/Makefile
@@ -1260,6 +1260,7 @@ ifdef SANITIZE
 SANITIZERS := $(foreach flag,$(subst $(comma),$(space),$(SANITIZE)),$(flag))
 BASIC_CFLAGS += -fsanitize=$(SANITIZE) -fno-sanitize-recover=$(SANITIZE)
 BASIC_CFLAGS += -fno-omit-frame-pointer
+BASIC_CFLAGS += -DENABLE_CUSTOM_SANITIZER_OPTIONS
 ifneq ($(filter undefined,$(SANITIZERS)),)
 BASIC_CFLAGS += -DSHA1DC_FORCE_ALIGNED_ACCESS
 endif
diff --git a/common-main.c b/common-main.c
index 71e21dd20a..bff594ac04 100644
--- a/common-main.c
+++ b/common-main.c
@@ -2,6 +2,10 @@
 #include "exec-cmd.h"
 #include "attr.h"
 
+#ifdef ENABLE_CUSTOM_SANITIZER_OPTIONS
+#include <sanitizer/asan_interface.h>
+#endif
+
 /*
  * Many parts of Git have subprograms communicate via pipe, expect the
  * upstream of a pipe to die with SIGPIPE when the downstream of a
@@ -23,6 +27,18 @@ static void restore_sigpipe_to_default(void)
 	signal(SIGPIPE, SIG_DFL);
 }
 
+static void handle_custom_sanitizer_options(void)
+{
+#ifdef ENABLE_CUSTOM_SANITIZER_OPTIONS
+	const char *v;
+	v = getenv("GIT_SANITIZER_FD");
+	if (v) {
+		/* weird int-as-void interface from libsanitizer */
+		__sanitizer_set_report_fd((void *)(intptr_t)atoi(v));
+	}
+#endif
+}
+
 int main(int argc, const char **argv)
 {
 	int result;
@@ -37,6 +53,8 @@ int main(int argc, const char **argv)
 	sanitize_stdfds();
 	restore_sigpipe_to_default();
 
+	handle_custom_sanitizer_options();
+
 	git_resolve_executable_dir(argv[0]);
 
 	git_setup_gettext();
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 62627afeaf..674bd30c44 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -54,22 +54,16 @@ then
 	# checking.
 	LSAN_OPTIONS=abort_on_error=1
 else
-	# The test script has possible LSAN failures. Just disable
-	# leak-checking entirely. Another option would be to log the failures
-	# with:
-	#
-	#   LSAN_OPTIONS=exitcode=0:log_path=$TEST_DIRECTORY/lsan/out
-	#
-	# The results are rather confusing, though, as the logs are
-	# per-process; you have no idea which one came from which test script.
-	# Ideally we'd send them to descriptor 4 along with the rest of the
-	# script log, but there's no LSAN_OPTION for that (recent versions of
-	# libsanitizer do have a public function to do so, so we could hook it
-	# ourselves via common-main).
-	LSAN_OPTIONS=detect_leaks=0
+	# The test script has possible LSAN failures. Just log them but don't
+	# touch the exit code.
+	LSAN_OPTIONS=exitcode=0
 fi
 export LSAN_OPTIONS
 
+# If we do generate output, try to avoid it getting tangled up with stderr.
+GIT_SANITIZER_FD=4
+export GIT_SANITIZER_FD
+
 if test ! -f "$GIT_BUILD_DIR"/GIT-BUILD-OPTIONS
 then
 	echo >&2 'error: GIT-BUILD-OPTIONS missing (has Git been built?).'
@@ -463,6 +457,7 @@ unset VISUAL EMAIL LANGUAGE $("$PERL_PATH" -e '
 		PERF_
 		CURL_VERBOSE
 		TRACE_CURL
+		SANITIZER_.*
 	));
 	my @vars = grep(/^GIT_/ && !/^GIT_($ok)/o, @env);
 	print join("\n", @vars);
Ævar Arnfjörð Bjarmason Sept. 2, 2021, 12:25 p.m. UTC | #3
On Wed, Sep 01 2021, Jeff King wrote:

> On Tue, Aug 31, 2021 at 03:35:34PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>>  * In v2 compiling with SANITIZE=leak would change things so only
>>    known-good passing tests were run by default, everything else would
>>    pass as a dummy. Now the default running of tests is unchanged, but
>>    if we run with GIT_TEST_PASSING_SANITIZE_LEAK=true only those tests
>>    are run which set and export TEST_PASSES_SANITIZE_LEAK=true.
>> 
>>  * The facility for declaring known-good tests in test-lib.sh based on
>>    wildcards is gone, instead individual tests need to declare if
>>    they're OK under SANITIZE=leak.[...]
>
> Hmm. This still seems more complicated than we need. If we just want a
> flag in each script, then test-lib.sh can use that flag to tweak
> LSAN_OPTIONS. See the patch below.

On the "pragma" include v.s. env var + export: I figured this would be
easier to read as I thought the export was required (I don't think it is
in most cases, but e.g. for t0000*.sh I think it is, but that's from
memory...).

> That has two drawbacks:
>
>   - it doesn't have any way to switch the flag per-test. But IMHO it is
>     a mistake to go in that direction. This is all temporary scaffolding
>     while we have leaks, and the script-level of granularity is fine.

We have a lot of tests that do simple checking of the tool itself, and
later in the script might be stressing trace2, or common sources of
leaks like "git log" in combination with the tool (e.g. the commit-graph
tests).

So being able to tweak this inside the script is useful, but that can of
course also be done with this proposed TEST_LSAN_OK + prereq.

>   - it runs the tests not marked as LSAN-OK, just without leak checking,
>     which is redundant in CI where we're already running them. But we
>     could still be collecting leak stats (and just not failing the
>     tests). See the patch below.

Sure, I'd prefer 

>     If we do care about not running them, then I think it makes more
>     sense to extend the run/skip mechanisms and build on that.

The patch I have here is already nicely integrated with the skip
mechanism. I.e. we use skip_all which shows a summary in any TAP
consumer, and we can skip individual tests with prerequisites.

>     (I also think I prefer the central list of "mark these scripts as OK
>     for leak-checking", rather than annotating individuals. Because
>     again, this is temporary, and it's nice to keep it in a sandbox that
>     only people working on leak-checking would look at or touch).
>
> I realize this is kind-of bikeshedding, and I'm not vehemently opposed
> to what you have here. It just seems like fewer moving parts would be
> less likely to confuse folks who want to poke at it.

I can see that for the proposed v2 mechanism, but in this v3 nothing
changes unless you opt-in to things via new GIT_TEST_* setting. So the
chance for confusion seems minimal to nonexisting.

I was interested in doing some summaries of existing leaks
eventually. It seems even with LSAN_OPTIONS=detect_leaks=0 compiling
with SANITIZE=leak make things a bit slower, but not by much (but actual
leak checking is much slower).

But I'd prefer to leave any "write out leak logs and summarize" step for
some later change.

>>    This is done via "export
>>    TEST_PASSES_SANITIZE_LEAK=true", there's a handy import of
>>    "./test-pragma-SANITIZE=leak-ok.sh" before sourcing "./test-lib.sh"
>>    itself to set this.
>
> I found the extra level of indirection added by this pragma confusing.
> We just need to set a variable, which is also a one-liner, and one that
> is more obvious about what it's doing. In your code you also export it,
> but that's not necessary for something that test-lib.sh is going to look
> at. Or if it's really necessary at some point, then test-lib.sh can do
> the export itself.

*nod*, will remove it per discussion above.

>> Ævar Arnfjörð Bjarmason (8):
>>   Makefile: add SANITIZE=leak flag to GIT-BUILD-OPTIONS
>>   CI: refactor "if" to "case" statement
>>   tests: add a test mode for SANITIZE=leak, run it in CI
>>   tests: annotate t000*.sh with TEST_PASSES_SANITIZE_LEAK=true
>>   tests: annotate t001*.sh with TEST_PASSES_SANITIZE_LEAK=true
>>   tests: annotate t002*.sh with TEST_PASSES_SANITIZE_LEAK=true
>>   tests: annotate select t0*.sh with TEST_PASSES_SANITIZE_LEAK=true
>>   tests: annotate select t*.sh with TEST_PASSES_SANITIZE_LEAK=true
>
> Sort of a meta-question, but what's the plan for folks who add a new
> test to say t0000, and it reveals a leak in code they didn't touch?

Then CI will fail on this job. We'd have those same failures now
(e.g. the mentioned current delta between master..seen), we just don't
see them. Having visibility on them seems like an improvement.

> They'll get a CI failure (as will Junio if he picks up the patch), so
> somebody is going to have to deal with it. Do they fix it? Do they unset
> the "this script is OK" flag? Do they mark the individual test as
> non-lsan-ok?

I'd think they'd fix it, or make marking the regression as OK part of
their re-roll, just like failures on master..seen now.

If you're getting at that we should start out this job as an FYI job
that doesn't impact the CI run's overall status if it fails I think that
would be OK as a start.

> I do like the idea of finding real regressions. But while the state of
> leak-checking is still so immature, I'm worried about this adding extra
> friction for developers. Especially if they get some spooky action at a
> distance caused by a leak in far-away code.

Yeah, ultimately this series is an implicit endorsement of us caring
more than we do now.

I think this friction point is going to be mitigated a lot by the
ability I've added to not just skip entire test scripts, but allow
prereq skipping of some tests, early bailing out etc.

It allows you to say add a "git log" test at the end of some test that
otherwise just uses some core API or a test tool and not have to throw
the baby out with the bathwater in terms of disabling all existing leak
checks there to make forward progress (or split up the entire test
script).

> Anyway, here's LSAN_OPTIONS thing I was thinking of.

Thanks, that & your follow-up is very interesting. Can I assume this has
your SOB? I'd like to add that redirect to fd 4 change to this series.

> diff --git a/t/t0001-init.sh b/t/t0001-init.sh
> index df544bb321..b1da18955d 100755
> --- a/t/t0001-init.sh
> +++ b/t/t0001-init.sh
> @@ -2,6 +2,7 @@
>  
>  test_description='git init'
>  
> +TEST_LSAN_OK=1
>  . ./test-lib.sh
>  
>  check_config () {
> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index abcfbed6d6..62627afeaf 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -44,9 +44,30 @@ GIT_BUILD_DIR="$TEST_DIRECTORY"/..
>  : ${ASAN_OPTIONS=detect_leaks=0:abort_on_error=1}
>  export ASAN_OPTIONS
>  
> -# If LSAN is in effect we _do_ want leak checking, but we still
> -# want to abort so that we notice the problems.
> -: ${LSAN_OPTIONS=abort_on_error=1}
> +if test -n "$LSAN_OPTIONS"
> +then
> +	# Leave user-provided options alone.
> +	:
> +elif test -n "$TEST_LSAN_OK"
> +then
> +	# The test script has declared itself as LSAN-clean; turn on full leak
> +	# checking.
> +	LSAN_OPTIONS=abort_on_error=1
> +else
> +	# The test script has possible LSAN failures. Just disable
> +	# leak-checking entirely. Another option would be to log the failures
> +	# with:
> +	#
> +	#   LSAN_OPTIONS=exitcode=0:log_path=$TEST_DIRECTORY/lsan/out
> +	#
> +	# The results are rather confusing, though, as the logs are
> +	# per-process; you have no idea which one came from which test script.
> +	# Ideally we'd send them to descriptor 4 along with the rest of the
> +	# script log, but there's no LSAN_OPTION for that (recent versions of
> +	# libsanitizer do have a public function to do so, so we could hook it
> +	# ourselves via common-main).
> +	LSAN_OPTIONS=detect_leaks=0
> +fi
>  export LSAN_OPTIONS
>  
>  if test ! -f "$GIT_BUILD_DIR"/GIT-BUILD-OPTIONS
Jeff King Sept. 3, 2021, 11:13 a.m. UTC | #4
On Thu, Sep 02, 2021 at 02:25:33PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > Hmm. This still seems more complicated than we need. If we just want a
> > flag in each script, then test-lib.sh can use that flag to tweak
> > LSAN_OPTIONS. See the patch below.
> 
> On the "pragma" include v.s. env var + export: I figured this would be
> easier to read as I thought the export was required (I don't think it is
> in most cases, but e.g. for t0000*.sh I think it is, but that's from
> memory...).

I admit that half of my complaint with the pragma is the weird filename
with an "=" in it. :) But I do think just assigning the variable is the
most readable thing.  If t0000 needs to export for whatever reason, it
can do so (preferably with a comment explaining why).

> > That has two drawbacks:
> >
> >   - it doesn't have any way to switch the flag per-test. But IMHO it is
> >     a mistake to go in that direction. This is all temporary scaffolding
> >     while we have leaks, and the script-level of granularity is fine.
> 
> We have a lot of tests that do simple checking of the tool itself, and
> later in the script might be stressing trace2, or common sources of
> leaks like "git log" in combination with the tool (e.g. the commit-graph
> tests).
> 
> So being able to tweak this inside the script is useful, but that can of
> course also be done with this proposed TEST_LSAN_OK + prereq.

Getting rid of the "let's tell the tests that we were built with LSAN"
was part of the simplicity I was going for (and obviously does preclude
a prerequisite). I had hoped we wouldn't need to do per-test stuff,
because this was all a temporary state. But maybe that's naive.

> >     If we do care about not running them, then I think it makes more
> >     sense to extend the run/skip mechanisms and build on that.
> 
> The patch I have here is already nicely integrated with the skip
> mechanism. I.e. we use skip_all which shows a summary in any TAP
> consumer, and we can skip individual tests with prerequisites.

I meant here that we'd be driving the selection externally from the
tests using the skip/run mechanisms (something along the lines of what I
sketched out before).

But I admit that there isn't really a big difference between the two
approaches. Since you've coded this one up already, let's go in that
direction (i.e., this series).

> I was interested in doing some summaries of existing leaks
> eventually. It seems even with LSAN_OPTIONS=detect_leaks=0 compiling
> with SANITIZE=leak make things a bit slower, but not by much (but actual
> leak checking is much slower).
> 
> But I'd prefer to leave any "write out leak logs and summarize" step for
> some later change.

OK, I can live with that (especially given how apparently difficult it
is to convince LSAN to do it).

> > Sort of a meta-question, but what's the plan for folks who add a new
> > test to say t0000, and it reveals a leak in code they didn't touch?
> 
> Then CI will fail on this job. We'd have those same failures now
> (e.g. the mentioned current delta between master..seen), we just don't
> see them. Having visibility on them seems like an improvement.
> 
> > They'll get a CI failure (as will Junio if he picks up the patch), so
> > somebody is going to have to deal with it. Do they fix it? Do they unset
> > the "this script is OK" flag? Do they mark the individual test as
> > non-lsan-ok?
> 
> I'd think they'd fix it, or make marking the regression as OK part of
> their re-roll, just like failures on master..seen now.
> 
> If you're getting at that we should start out this job as an FYI job
> that doesn't impact the CI run's overall status if it fails I think that
> would be OK as a start.

I think that would be OK, but I'm not quite sure of the best way to do
it. Why don't we start it as a regular required job, and then we can see
how often it is causing a headache. If once every few months somebody
fixes a leak, I'd be happy. If new developers are getting tangled up
constantly in unrelated leaks, then that's something we'd need to
revisit.

> > I do like the idea of finding real regressions. But while the state of
> > leak-checking is still so immature, I'm worried about this adding extra
> > friction for developers. Especially if they get some spooky action at a
> > distance caused by a leak in far-away code.
> 
> Yeah, ultimately this series is an implicit endorsement of us caring
> more than we do now.
> 
> I think this friction point is going to be mitigated a lot by the
> ability I've added to not just skip entire test scripts, but allow
> prereq skipping of some tests, early bailing out etc.

I half-agree with your final paragraph. The biggest friction point I
think will be for new folks when CI starts failing, and they don't
understand why (or where the problem is, or how to debug it, etc). But
like I said, let's see what happens.

> > Anyway, here's LSAN_OPTIONS thing I was thinking of.
> 
> Thanks, that & your follow-up is very interesting. Can I assume this has
> your SOB? I'd like to add that redirect to fd 4 change to this series.

Yes, go for it.

-Peff