diff mbox series

racy leak sanitizer builds, was Re: [PATCH 0/9] commit-reach: -Wsign-compare follow-ups

Message ID 20241228190541.GA815586@coredump.intra.peff.net (mailing list archive)
State New
Headers show
Series racy leak sanitizer builds, was Re: [PATCH 0/9] commit-reach: -Wsign-compare follow-ups | expand

Commit Message

Jeff King Dec. 28, 2024, 7:05 p.m. UTC
On Sat, Dec 28, 2024 at 09:27:41AM +0100, Patrick Steinhardt wrote:

>   - The leak-checking jobs fail quite regularly in t0003 with something
>     that feels like either a race caused by a leak or an issue with the
>     sanitizer itself [2]:
> 
>     ==git==17055==ERROR: LeakSanitizer: detected memory leaks
>     Direct leak of 32 byte(s) in 1 object(s) allocated from:
>     #0 0x7aa0d03c7713 in realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:98
>     #1 0x7aa0d0221f69 in pthread_getattr_np (/lib/x86_64-linux-gnu/libc.so.6+0x9df69) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
>     #2 0x7aa0d03d9544 in __sanitizer::GetThreadStackTopAndBottom(bool, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:150
>     #3 0x7aa0d03d96fa in __sanitizer::GetThreadStackAndTls(bool, unsigned long*, unsigned long*, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:614
>     #4 0x7aa0d03cb2b9 in __lsan::ThreadStart(unsigned int, unsigned long long, __sanitizer::ThreadType) ../../../../src/libsanitizer/lsan/lsan_posix.cpp:53
>     #5 0x7aa0d03c756a in ThreadStartFunc<false> ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:431
>     #6 0x7aa0d0220a93  (/lib/x86_64-linux-gnu/libc.so.6+0x9ca93) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
>     #7 0x7aa0d02adc3b  (/lib/x86_64-linux-gnu/libc.so.6+0x129c3b) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
>     DEDUP_TOKEN: ___interceptor_realloc--pthread_getattr_np--__sanitizer::GetThreadStackTopAndBottom(bool, unsigned long*, unsigned long*)--__sanitizer::GetThreadStackAndTls(bool, unsigned long*, unsigned long*, unsigned long*, unsigned long*)--__lsan::ThreadStart(unsigned int, unsigned long long, __sanitizer::ThreadType)--ThreadStartFunc<false>----
>     SUMMARY: LeakSanitizer: 32 byte(s) leaked in 1 allocation(s).

I did a bit of digging on this last week, but didn't come up with a very
satisfactory solution. You can reproduce easily by building with
SANITIZE=leak and running t0003 with --stress.

It's the same issue we tried to address in 993d38a066 (index-pack: spawn
threads atomically, 2024-01-05): one thread calls exit() and takes down
the process while other threads are still being spawned (so the leak
checker runs while those other threads are in a weird state where lsan
has allocated some memory but not yet set up the thread stack as a place
to look for reachable memory).

There we dealt with it by making sure no thread started work (and thus
hit the exit call) until all of them were sanitized. I tried doing
something similar here, like:

  diff --git a/builtin/grep.c b/builtin/grep.c
  index 98b85c7fca..866645e6f8 100644
  --- a/builtin/grep.c
  +++ b/builtin/grep.c
  @@ -233,18 +233,20 @@ static void start_threads(struct grep_opt *opt)
                  strbuf_init(&todo[i].out, 0);
          }

  +       grep_lock();
          CALLOC_ARRAY(threads, num_threads);
          for (i = 0; i < num_threads; i++) {
                  int err;
                  struct grep_opt *o = grep_opt_dup(opt);
                  o->output = strbuf_out;
                  compile_grep_patterns(o);
                  err = pthread_create(&threads[i], NULL, run, o);

                  if (err)
                          die(_("grep: failed to create thread: %s"),
                              strerror(err));
          }
  +       grep_unlock();
   }

   static int wait_all(void)

but it doesn't work. In fact, this does nothing at all because each
thread will start by looking for work to do via get_work(), and we do
not call add_work() to give them anything to do until all threads are
spawned.

So I suspect the race is actually trickier, and that the "weird state"
is not something that happens just while pthread_create() is being
called, but is actually running _in the thread itself_. So even though
pthread_create() has returned for each thread, they are still setting
themselves up before running.

It mostly worked in 993d38a066 because index-pack has to do more work to
get to the exit() call. So delaying the start of the threads was enough
to usually win the race. But here, the individual grep threads get to
the exit() call very quickly, and it's not enough.

Which would mean that 993d38a066 is not actually a full fix, either. And
indeed, I can get t5309 to fail even with that patch (which is weird,
because that was how I tested it originally; I wonder if anything on the
LSan side changed?).

So a full fix would actually require synchronization where we spawn each
thread, then wait for all of them to hit the barrier to declare
themselves ready, and then let them all start running. There is a
pthread_barrier type that would help with this, but we've never used it
before (so we'd probably need to at least provide a Windows compat
layer).

One quick workaround is this:


Or you could even imagine automatically forcing online_cpus() to "1" for
LSan builds, which would fix it everywhere. But then we'd miss any leaks
that are specific to the threaded code.

Or of course we could try to engage with LSan folks about whether this
can be fixed there. I don't think we've reported it anywhere there.

-Peff

Comments

Jeff King Dec. 28, 2024, 7:23 p.m. UTC | #1
On Sat, Dec 28, 2024 at 02:05:41PM -0500, Jeff King wrote:

> So I suspect the race is actually trickier, and that the "weird state"
> is not something that happens just while pthread_create() is being
> called, but is actually running _in the thread itself_. So even though
> pthread_create() has returned for each thread, they are still setting
> themselves up before running.
> [...]
> So a full fix would actually require synchronization where we spawn each
> thread, then wait for all of them to hit the barrier to declare
> themselves ready, and then let them all start running. There is a
> pthread_barrier type that would help with this, but we've never used it
> before (so we'd probably need to at least provide a Windows compat
> layer).

So here's a fix that seems to work, but doesn't address the portability
issues. One tweak is that the exit() call is actually not in a
sub-thread, but in the main thread itself. I don't think that really
changes the analysis too much, but it does mean we need to include the
main thread in the barrier wait.

diff --git a/builtin/grep.c b/builtin/grep.c
index d00ee76f24..933b4503b8 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -101,6 +101,9 @@ static pthread_cond_t cond_write;
 /* Signalled when we are finished with everything. */
 static pthread_cond_t cond_result;
 
+/* Synchronize the start of all threads */
+static pthread_barrier_t start_barrier;
+
 static int skip_first_line;
 
 static void add_work(struct grep_opt *opt, struct grep_source *gs)
@@ -198,6 +201,8 @@ static void *run(void *arg)
 	int hit = 0;
 	struct grep_opt *opt = arg;
 
+	pthread_barrier_wait(&start_barrier);
+
 	while (1) {
 		struct work_item *w = get_work();
 		if (!w)
@@ -229,6 +234,7 @@ static void start_threads(struct grep_opt *opt)
 	pthread_cond_init(&cond_add, NULL);
 	pthread_cond_init(&cond_write, NULL);
 	pthread_cond_init(&cond_result, NULL);
+	pthread_barrier_init(&start_barrier, NULL, num_threads + 1);
 	grep_use_locks = 1;
 	enable_obj_read_lock();
 
@@ -248,6 +254,7 @@ static void start_threads(struct grep_opt *opt)
 			die(_("grep: failed to create thread: %s"),
 			    strerror(err));
 	}
+	pthread_barrier_wait(&start_barrier);
 }
 
 static int wait_all(void)
@@ -284,6 +291,7 @@ static int wait_all(void)
 	pthread_cond_destroy(&cond_add);
 	pthread_cond_destroy(&cond_write);
 	pthread_cond_destroy(&cond_result);
+	pthread_barrier_destroy(&start_barrier);
 	grep_use_locks = 0;
 	disable_obj_read_lock();
 

It would probably be pretty easy to do something similar for index-pack,
though really this is something I guess we'd need to do manually
anywhere we spawn worker threads.

-Peff
Jeff King Dec. 28, 2024, 7:31 p.m. UTC | #2
On Sat, Dec 28, 2024 at 02:23:07PM -0500, Jeff King wrote:

> It would probably be pretty easy to do something similar for index-pack,
> though really this is something I guess we'd need to do manually
> anywhere we spawn worker threads.

Here's what that would look like. Confirmed that this fixes the issue
running t5309 with --stress. Dropping the extra locks in
resolve_deltas() is just backing out that earlier insufficient attempt
to synchronize things.

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index d773809c4c..de1f5d8628 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -185,6 +185,8 @@ static pthread_mutex_t deepest_delta_mutex;
 
 static pthread_key_t key;
 
+static pthread_barrier_t start_barrier;
+
 static inline void lock_mutex(pthread_mutex_t *mutex)
 {
 	if (threads_active)
@@ -209,6 +211,7 @@ static void init_thread(void)
 	if (show_stat)
 		pthread_mutex_init(&deepest_delta_mutex, NULL);
 	pthread_key_create(&key, NULL);
+	pthread_barrier_init(&start_barrier, NULL, nr_threads);
 	CALLOC_ARRAY(thread_data, nr_threads);
 	for (i = 0; i < nr_threads; i++) {
 		thread_data[i].pack_fd = xopen(curr_pack, O_RDONLY);
@@ -231,6 +234,7 @@ static void cleanup_thread(void)
 	for (i = 0; i < nr_threads; i++)
 		close(thread_data[i].pack_fd);
 	pthread_key_delete(key);
+	pthread_barrier_destroy(&start_barrier);
 	free(thread_data);
 }
 
@@ -1100,6 +1104,8 @@ static int compare_ref_delta_entry(const void *a, const void *b)
 
 static void *threaded_second_pass(void *data)
 {
+	if (threads_active)
+		pthread_barrier_wait(&start_barrier);
 	if (data)
 		set_thread_data(data);
 	for (;;) {
@@ -1336,15 +1342,13 @@ static void resolve_deltas(struct pack_idx_option *opts)
 	base_cache_limit = opts->delta_base_cache_limit * nr_threads;
 	if (nr_threads > 1 || getenv("GIT_FORCE_THREADS")) {
 		init_thread();
-		work_lock();
 		for (i = 0; i < nr_threads; i++) {
 			int ret = pthread_create(&thread_data[i].thread, NULL,
 						 threaded_second_pass, thread_data + i);
 			if (ret)
 				die(_("unable to create thread: %s"),
 				    strerror(ret));
 		}
-		work_unlock();
 		for (i = 0; i < nr_threads; i++)
 			pthread_join(thread_data[i].thread, NULL);
 		cleanup_thread();
René Scharfe Dec. 29, 2024, 12:02 p.m. UTC | #3
Am 28.12.2024 um 20:23 schrieb Jeff King:
> On Sat, Dec 28, 2024 at 02:05:41PM -0500, Jeff King wrote:
>
>> So I suspect the race is actually trickier, and that the "weird state"
>> is not something that happens just while pthread_create() is being
>> called, but is actually running _in the thread itself_. So even though
>> pthread_create() has returned for each thread, they are still setting
>> themselves up before running.
>> [...]
>> So a full fix would actually require synchronization where we spawn each
>> thread, then wait for all of them to hit the barrier to declare
>> themselves ready, and then let them all start running. There is a
>> pthread_barrier type that would help with this, but we've never used it
>> before (so we'd probably need to at least provide a Windows compat
>> layer).
>
> So here's a fix that seems to work, but doesn't address the portability
> issues.

Windows has Synchronization Barriers.  Adding the following lines to
compat/win32/pthread.h at least lets your example compile and run:

#include <synchapi.h>
#define pthread_barrier_t SYNCHRONIZATION_BARRIER
#define PTHREAD_BARRIER_SERIAL_THREAD TRUE
#define pthread_barrier_init(b, a, c) return_0(InitializeSynchronizationBarrier((b), (c), -1))
#define pthread_barrier_destroy(b) return_0(DeleteSynchronizationBarrier(b))
#define pthread_barrier_wait(b) EnterSynchronizationBarrier((b), 0)

Catching a non-NULL second argument to pthread_barrier_init() would be a
good idea in a production version.  Error handling would be a good idea
in general, but callers would then actually have to check those errors.

Synchronization Barriers were added with Windows 8 and Windows Server
2012, Git for Windows requires higher versions, so this native
mechanism should be usable.  Relevant links:

https://learn.microsoft.com/en-us/windows/win32/sync/synchronization-barriers
https://github.com/git-for-windows/git/wiki/FAQ


However, macOS doesn't have pthread barriers.  Here's an implementation
that had to be fixed to satisfy Coverity, so it might be good now?

https://github.com/libusb/hidapi/blob/master/mac/hid.c

Perhaps that implementation could be used for Windows as well?  All
functions it uses are provided by compat/win32/pthread.h; not sure if
they are sufficiently fleshed out, though.

René
Jeff King Dec. 29, 2024, 4:57 p.m. UTC | #4
On Sun, Dec 29, 2024 at 01:02:13PM +0100, René Scharfe wrote:

> Synchronization Barriers were added with Windows 8 and Windows Server
> 2012, Git for Windows requires higher versions, so this native
> mechanism should be usable.  Relevant links:
> [...]
> However, macOS doesn't have pthread barriers.  Here's an implementation
> that had to be fixed to satisfy Coverity, so it might be good now?

Yep, that matches what I'd found so far.

One of the reasons I hadn't sent anything is that I was waffling between
two approaches:

  - implement barriers everywhere and just use them. More work, but we'd
    have the tool if we wanted to use it later, and all builds behave
    the same.

  - make a "maybe_barrier" interface that might be a noop, and let most
    platforms compile without them. They are not needed for correct
    operation in most cases, but only to work around a sanitizer problem.
    And it is not even a problem that comes up frequently; it is a race
    that we occasionally see in CI. So enabling it only for our
    linux-leaks CI job would be enough to dull the pain.

    And there is no risk of any portability or run-time issues, because
    the code is a noop for most builds.

> https://github.com/libusb/hidapi/blob/master/mac/hid.c
> 
> Perhaps that implementation could be used for Windows as well?  All
> functions it uses are provided by compat/win32/pthread.h; not sure if
> they are sufficiently fleshed out, though.

Yeah, I found a similar one. I think it's an undergrad OS assignment to
implement barriers using semaphores, but probably building on a mutex
and cond is less horrible. ;)

-Peff
Jeff King Dec. 30, 2024, 4:32 a.m. UTC | #5
On Sun, Dec 29, 2024 at 11:57:15AM -0500, Jeff King wrote:

> One of the reasons I hadn't sent anything is that I was waffling between
> two approaches:
> 
>   - implement barriers everywhere and just use them. More work, but we'd
>     have the tool if we wanted to use it later, and all builds behave
>     the same.
> 
>   - make a "maybe_barrier" interface that might be a noop, and let most
>     platforms compile without them. They are not needed for correct
>     operation in most cases, but only to work around a sanitizer problem.
>     And it is not even a problem that comes up frequently; it is a race
>     that we occasionally see in CI. So enabling it only for our
>     linux-leaks CI job would be enough to dull the pain.
> 
>     And there is no risk of any portability or run-time issues, because
>     the code is a noop for most builds.

I sent the "maybe" interface in this series if you want to take a look:

  https://lore.kernel.org/git/20241230042325.GA112439@coredump.intra.peff.net/

I'm not married to that approach, but it seemed like the easiest place
to start. And I especially wanted to see how much hand-waving was
required to justify it in the third patch. It's a fair bit. ;)

-Peff
Junio C Hamano Dec. 30, 2024, 1:46 p.m. UTC | #6
Jeff King <peff@peff.net> writes:

> One of the reasons I hadn't sent anything is that I was waffling between
> two approaches:
>
>   - implement barriers everywhere and just use them. More work, but we'd
>     have the tool if we wanted to use it later, and all builds behave
>     the same.
>
>   - make a "maybe_barrier" interface that might be a noop, and let most
>     platforms compile without them. They are not needed for correct
>     operation in most cases, but only to work around a sanitizer problem.
>     And it is not even a problem that comes up frequently; it is a race
>     that we occasionally see in CI. So enabling it only for our
>     linux-leaks CI job would be enough to dull the pain.
>
>     And there is no risk of any portability or run-time issues, because
>     the code is a noop for most builds.

I love when people think before committing to an approach, and after
seeing these two cohices, I tend to have slight preference for the
latter over the former.  The work will not be wasted even if it
later turns out that we need a full-blown barrier implementation for
other platforms.

Thanks.
diff mbox series

Patch

diff --git a/t/t0003-attributes.sh b/t/t0003-attributes.sh
index 3c98b622f2..7ecaf8f4e3 100755
--- a/t/t0003-attributes.sh
+++ b/t/t0003-attributes.sh
@@ -443,7 +443,7 @@  test_expect_success 'diff without repository with attr source' '
 		cat >expect <<-EOF &&
 		fatal: cannot use --attr-source or GIT_ATTR_SOURCE without repo
 		EOF
-		test_must_fail env GIT_ATTR_SOURCE=HEAD git grep --no-index foo file 2>err &&
+		test_must_fail env GIT_ATTR_SOURCE=HEAD git grep --threads=1 --no-index foo file 2>err &&
 		test_cmp expect err
 	)
 '