Message ID | 20241228190541.GA815586@coredump.intra.peff.net (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | racy leak sanitizer builds, was Re: [PATCH 0/9] commit-reach: -Wsign-compare follow-ups | expand |
On Sat, Dec 28, 2024 at 02:05:41PM -0500, Jeff King wrote: > So I suspect the race is actually trickier, and that the "weird state" > is not something that happens just while pthread_create() is being > called, but is actually running _in the thread itself_. So even though > pthread_create() has returned for each thread, they are still setting > themselves up before running. > [...] > So a full fix would actually require synchronization where we spawn each > thread, then wait for all of them to hit the barrier to declare > themselves ready, and then let them all start running. There is a > pthread_barrier type that would help with this, but we've never used it > before (so we'd probably need to at least provide a Windows compat > layer). So here's a fix that seems to work, but doesn't address the portability issues. One tweak is that the exit() call is actually not in a sub-thread, but in the main thread itself. I don't think that really changes the analysis too much, but it does mean we need to include the main thread in the barrier wait. diff --git a/builtin/grep.c b/builtin/grep.c index d00ee76f24..933b4503b8 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -101,6 +101,9 @@ static pthread_cond_t cond_write; /* Signalled when we are finished with everything. */ static pthread_cond_t cond_result; +/* Synchronize the start of all threads */ +static pthread_barrier_t start_barrier; + static int skip_first_line; static void add_work(struct grep_opt *opt, struct grep_source *gs) @@ -198,6 +201,8 @@ static void *run(void *arg) int hit = 0; struct grep_opt *opt = arg; + pthread_barrier_wait(&start_barrier); + while (1) { struct work_item *w = get_work(); if (!w) @@ -229,6 +234,7 @@ static void start_threads(struct grep_opt *opt) pthread_cond_init(&cond_add, NULL); pthread_cond_init(&cond_write, NULL); pthread_cond_init(&cond_result, NULL); + pthread_barrier_init(&start_barrier, NULL, num_threads + 1); grep_use_locks = 1; enable_obj_read_lock(); @@ -248,6 +254,7 @@ static void start_threads(struct grep_opt *opt) die(_("grep: failed to create thread: %s"), strerror(err)); } + pthread_barrier_wait(&start_barrier); } static int wait_all(void) @@ -284,6 +291,7 @@ static int wait_all(void) pthread_cond_destroy(&cond_add); pthread_cond_destroy(&cond_write); pthread_cond_destroy(&cond_result); + pthread_barrier_destroy(&start_barrier); grep_use_locks = 0; disable_obj_read_lock(); It would probably be pretty easy to do something similar for index-pack, though really this is something I guess we'd need to do manually anywhere we spawn worker threads. -Peff
On Sat, Dec 28, 2024 at 02:23:07PM -0500, Jeff King wrote: > It would probably be pretty easy to do something similar for index-pack, > though really this is something I guess we'd need to do manually > anywhere we spawn worker threads. Here's what that would look like. Confirmed that this fixes the issue running t5309 with --stress. Dropping the extra locks in resolve_deltas() is just backing out that earlier insufficient attempt to synchronize things. diff --git a/builtin/index-pack.c b/builtin/index-pack.c index d773809c4c..de1f5d8628 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -185,6 +185,8 @@ static pthread_mutex_t deepest_delta_mutex; static pthread_key_t key; +static pthread_barrier_t start_barrier; + static inline void lock_mutex(pthread_mutex_t *mutex) { if (threads_active) @@ -209,6 +211,7 @@ static void init_thread(void) if (show_stat) pthread_mutex_init(&deepest_delta_mutex, NULL); pthread_key_create(&key, NULL); + pthread_barrier_init(&start_barrier, NULL, nr_threads); CALLOC_ARRAY(thread_data, nr_threads); for (i = 0; i < nr_threads; i++) { thread_data[i].pack_fd = xopen(curr_pack, O_RDONLY); @@ -231,6 +234,7 @@ static void cleanup_thread(void) for (i = 0; i < nr_threads; i++) close(thread_data[i].pack_fd); pthread_key_delete(key); + pthread_barrier_destroy(&start_barrier); free(thread_data); } @@ -1100,6 +1104,8 @@ static int compare_ref_delta_entry(const void *a, const void *b) static void *threaded_second_pass(void *data) { + if (threads_active) + pthread_barrier_wait(&start_barrier); if (data) set_thread_data(data); for (;;) { @@ -1336,15 +1342,13 @@ static void resolve_deltas(struct pack_idx_option *opts) base_cache_limit = opts->delta_base_cache_limit * nr_threads; if (nr_threads > 1 || getenv("GIT_FORCE_THREADS")) { init_thread(); - work_lock(); for (i = 0; i < nr_threads; i++) { int ret = pthread_create(&thread_data[i].thread, NULL, threaded_second_pass, thread_data + i); if (ret) die(_("unable to create thread: %s"), strerror(ret)); } - work_unlock(); for (i = 0; i < nr_threads; i++) pthread_join(thread_data[i].thread, NULL); cleanup_thread();
Am 28.12.2024 um 20:23 schrieb Jeff King: > On Sat, Dec 28, 2024 at 02:05:41PM -0500, Jeff King wrote: > >> So I suspect the race is actually trickier, and that the "weird state" >> is not something that happens just while pthread_create() is being >> called, but is actually running _in the thread itself_. So even though >> pthread_create() has returned for each thread, they are still setting >> themselves up before running. >> [...] >> So a full fix would actually require synchronization where we spawn each >> thread, then wait for all of them to hit the barrier to declare >> themselves ready, and then let them all start running. There is a >> pthread_barrier type that would help with this, but we've never used it >> before (so we'd probably need to at least provide a Windows compat >> layer). > > So here's a fix that seems to work, but doesn't address the portability > issues. Windows has Synchronization Barriers. Adding the following lines to compat/win32/pthread.h at least lets your example compile and run: #include <synchapi.h> #define pthread_barrier_t SYNCHRONIZATION_BARRIER #define PTHREAD_BARRIER_SERIAL_THREAD TRUE #define pthread_barrier_init(b, a, c) return_0(InitializeSynchronizationBarrier((b), (c), -1)) #define pthread_barrier_destroy(b) return_0(DeleteSynchronizationBarrier(b)) #define pthread_barrier_wait(b) EnterSynchronizationBarrier((b), 0) Catching a non-NULL second argument to pthread_barrier_init() would be a good idea in a production version. Error handling would be a good idea in general, but callers would then actually have to check those errors. Synchronization Barriers were added with Windows 8 and Windows Server 2012, Git for Windows requires higher versions, so this native mechanism should be usable. Relevant links: https://learn.microsoft.com/en-us/windows/win32/sync/synchronization-barriers https://github.com/git-for-windows/git/wiki/FAQ However, macOS doesn't have pthread barriers. Here's an implementation that had to be fixed to satisfy Coverity, so it might be good now? https://github.com/libusb/hidapi/blob/master/mac/hid.c Perhaps that implementation could be used for Windows as well? All functions it uses are provided by compat/win32/pthread.h; not sure if they are sufficiently fleshed out, though. René
On Sun, Dec 29, 2024 at 01:02:13PM +0100, René Scharfe wrote: > Synchronization Barriers were added with Windows 8 and Windows Server > 2012, Git for Windows requires higher versions, so this native > mechanism should be usable. Relevant links: > [...] > However, macOS doesn't have pthread barriers. Here's an implementation > that had to be fixed to satisfy Coverity, so it might be good now? Yep, that matches what I'd found so far. One of the reasons I hadn't sent anything is that I was waffling between two approaches: - implement barriers everywhere and just use them. More work, but we'd have the tool if we wanted to use it later, and all builds behave the same. - make a "maybe_barrier" interface that might be a noop, and let most platforms compile without them. They are not needed for correct operation in most cases, but only to work around a sanitizer problem. And it is not even a problem that comes up frequently; it is a race that we occasionally see in CI. So enabling it only for our linux-leaks CI job would be enough to dull the pain. And there is no risk of any portability or run-time issues, because the code is a noop for most builds. > https://github.com/libusb/hidapi/blob/master/mac/hid.c > > Perhaps that implementation could be used for Windows as well? All > functions it uses are provided by compat/win32/pthread.h; not sure if > they are sufficiently fleshed out, though. Yeah, I found a similar one. I think it's an undergrad OS assignment to implement barriers using semaphores, but probably building on a mutex and cond is less horrible. ;) -Peff
On Sun, Dec 29, 2024 at 11:57:15AM -0500, Jeff King wrote: > One of the reasons I hadn't sent anything is that I was waffling between > two approaches: > > - implement barriers everywhere and just use them. More work, but we'd > have the tool if we wanted to use it later, and all builds behave > the same. > > - make a "maybe_barrier" interface that might be a noop, and let most > platforms compile without them. They are not needed for correct > operation in most cases, but only to work around a sanitizer problem. > And it is not even a problem that comes up frequently; it is a race > that we occasionally see in CI. So enabling it only for our > linux-leaks CI job would be enough to dull the pain. > > And there is no risk of any portability or run-time issues, because > the code is a noop for most builds. I sent the "maybe" interface in this series if you want to take a look: https://lore.kernel.org/git/20241230042325.GA112439@coredump.intra.peff.net/ I'm not married to that approach, but it seemed like the easiest place to start. And I especially wanted to see how much hand-waving was required to justify it in the third patch. It's a fair bit. ;) -Peff
Jeff King <peff@peff.net> writes: > One of the reasons I hadn't sent anything is that I was waffling between > two approaches: > > - implement barriers everywhere and just use them. More work, but we'd > have the tool if we wanted to use it later, and all builds behave > the same. > > - make a "maybe_barrier" interface that might be a noop, and let most > platforms compile without them. They are not needed for correct > operation in most cases, but only to work around a sanitizer problem. > And it is not even a problem that comes up frequently; it is a race > that we occasionally see in CI. So enabling it only for our > linux-leaks CI job would be enough to dull the pain. > > And there is no risk of any portability or run-time issues, because > the code is a noop for most builds. I love when people think before committing to an approach, and after seeing these two cohices, I tend to have slight preference for the latter over the former. The work will not be wasted even if it later turns out that we need a full-blown barrier implementation for other platforms. Thanks.
diff --git a/t/t0003-attributes.sh b/t/t0003-attributes.sh index 3c98b622f2..7ecaf8f4e3 100755 --- a/t/t0003-attributes.sh +++ b/t/t0003-attributes.sh @@ -443,7 +443,7 @@ test_expect_success 'diff without repository with attr source' ' cat >expect <<-EOF && fatal: cannot use --attr-source or GIT_ATTR_SOURCE without repo EOF - test_must_fail env GIT_ATTR_SOURCE=HEAD git grep --no-index foo file 2>err && + test_must_fail env GIT_ATTR_SOURCE=HEAD git grep --threads=1 --no-index foo file 2>err && test_cmp expect err ) '