diff mbox series

RFC - concurrency causes segfault in git grep since 2.26.0

Message ID CABURp0r_VPNeU5ugbJDQo=hV_aOit7W-pf9nhDF9Og=vVJG8tw@mail.gmail.com (mailing list archive)
State New, archived
Headers show
Series RFC - concurrency causes segfault in git grep since 2.26.0 | expand

Commit Message

Phil Hord Sept. 25, 2020, 2:36 a.m. UTC
Hi list.  It's been a while.

I can reliably reproduce a segfault in git grep when searching the
object DB. It is caused by a race when threads > 2.

I have a patch that fixes the problem, but I don't know exactly why.
Can someone else explain it and/or offer a better solution?

====

                if (!delta_data) {
                        error("failed to unpack compressed delta "

====

My notes thus far:

I have seen it fail with `--threads 3`, but it fails very reliably for
me with `--threads 20`.

My repo is large (7GB) which gives grep enough to do. I'm not able to
make it fail on git.git; it seems like it's too small to run into this
problem.

The problem bisects to 1d1729caeb.  It was somewhat hard to find this,
though, because git grep disables --threads when reading from the
object store until another patch later in the same series.

====

$ git bisect run sh -c \
    "sed -i -e 's/list.nr || cached || show_in_pager/list.nr ||
show_in_pager/' builtin/grep.c || exit 129;
    make -j 30 || exit 125;
    git checkout HEAD -- builtin/grep.c ;
    ./bin-wrappers/git -C ~/git/purity grep --threads 20 --cached
'target-regex' || exit 1"

1d1729caebd41b340dd8dd61057f613da4df526c is the first bad commit
commit 1d1729caebd41b340dd8dd61057f613da4df526c
Author: Matheus Tavares <matheus.bernardino@usp.br>
Date:   Wed Jan 15 23:39:54 2020 -0300

    grep: replace grep_read_mutex by internal obj read lock

Here's what I'm testing with:
$ ./bin-wrappers/git --version
git version 2.28.0.585.ge1cfff6765

I have found that the crash always happens in cache_or_unpack_entry.
The thread that segfaults is always in patch_delta, and there are
always at least two threads inside unpack_compressed_entry at the time
of the failure.

I was able to get it to crash with just three threads one time.  I
copied a log with the stack trace of all (4) threads during that run
to a gist [1].

Another log showing four different segfaults when run with 40 threads
is also in that gist [1].

[1] https://gist.github.com/phord/02e84d003688baa493b978110225d443

Comments

Matheus Tavares Sept. 25, 2020, 5:52 a.m. UTC | #1
Hi, Phil

Thanks for the bug report and for the crash logs.

On Thu, Sep 24, 2020 at 11:36 PM Phil Hord <phil.hord@gmail.com> wrote:
>
> Hi list.  It's been a while.
>
> I can reliably reproduce a segfault in git grep when searching the
> object DB. It is caused by a race when threads > 2.
>
> I have a patch that fixes the problem, but I don't know exactly why.
> Can someone else explain it and/or offer a better solution?
>
> ====
>
> diff --git packfile.c packfile.c
> index e69012e7f2..52b7b54aeb 100644
> --- packfile.c
> +++ packfile.c
> @@ -1812,7 +1812,9 @@ void *unpack_entry(struct repository *r, struct
> packed_git *p, off_t obj_offset,
>                 if (!base)
>                         continue;
>
> +               obj_read_lock();
>                 delta_data = unpack_compressed_entry(p, &w_curs,
> curpos, delta_size);
> +               obj_read_unlock();
>
>                 if (!delta_data) {
>                         error("failed to unpack compressed delta "
>
> ====

Hmm, obj_read_mutex is a recursive lock, so by adding the
obj_read_lock() call here, we are incrementing the lock value to [at
least] 2 (because oid_object_info_extended() had already incremented
once). Therefore, when unpack_compressed_entry() calls obj_read_unlock()
before inflating the entry, the lock is not actually released, only
decremented. So the effect of this change is that the phase III of
unpack_entry() is performed entirely without releasing the lock.

Your crash log shows us that the segfault happened when trying to
memcpy() the string `base` (in unpack_entry()). I.e., the same string
that we had previously added to the delta base cache, right before
calling unpack_compressed_entry() and releasing the lock. The
problematic sequence is:
	add `base` to the cache -> release lock -> inflate ->
	acquire lock -> try to use `base`

Maybe, when a thread X releases the lock to perform decompression,
thread Y acquires the lock and tries to add another base to the cache.
The cache is full, so Y has to remove entries, and it ends up free()'ing
the base that was just inserted by X. Later, X tries to dereference
`base` in patch_entry(), which would cause a segfault. It would also
explain why your change solves the segfault.

I'm not sure, though, why this entry would be removed, since it was just
added... Maybe Y was adding a huge base to the cache, which required
removing all entries?

Anyways, you mentioned you can reproduce the failure quite reliably in your
repo with 20 threads, right? Could you please check with the following patch
applied? It adds a `base` copy to the cache instead of the original string,
allowing us to keep running decompression in parallel but without the risk of
having another thread free()'ing `base` in the mean time.

diff --git a/packfile.c b/packfile.c
index 9ef27508f2..79f83b6034 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1772,14 +1772,15 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
 	while (delta_stack_nr) {
 		void *delta_data;
 		void *base = data;
-		void *external_base = NULL;
 		unsigned long delta_size, base_size = size;
 		int i;
 
 		data = NULL;
 
-		if (base)
-			add_delta_base_cache(p, obj_offset, base, base_size, type);
+		if (base) {
+			char *base_dup = memcpy(xmalloc(base_size), base, base_size);
+			add_delta_base_cache(p, obj_offset, base_dup, base_size, type);
+		}
 
 		if (!base) {
 			/*
@@ -1799,7 +1800,6 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
 				      p->pack_name);
 				mark_bad_packed_object(p, base_oid.hash);
 				base = read_object(r, &base_oid, &type, &base_size);
-				external_base = base;
 			}
 		}
 
@@ -1818,7 +1818,7 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
 			      "at offset %"PRIuMAX" from %s",
 			      (uintmax_t)curpos, p->pack_name);
 			data = NULL;
-			free(external_base);
+			free(base);
 			continue;
 		}
 
@@ -1838,7 +1838,7 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
 			error("failed to apply delta");
 
 		free(delta_data);
-		free(external_base);
+		free(base);
 	}
 
 	if (final_type)
--
Phil Hord Sept. 25, 2020, 7:53 p.m. UTC | #2
Hi Matheus,

Thanks for the quick response.

On Thu, Sep 24, 2020 at 10:53 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>
> Hi, Phil
>
> Thanks for the bug report and for the crash logs.
>
> On Thu, Sep 24, 2020 at 11:36 PM Phil Hord <phil.hord@gmail.com> wrote:
> >
> > Hi list.  It's been a while.
> >
> > I can reliably reproduce a segfault in git grep when searching the
> > object DB. It is caused by a race when threads > 2.
> >
> > I have a patch that fixes the problem, but I don't know exactly why.
> > Can someone else explain it and/or offer a better solution?
> >
> > ====
> >
> > diff --git packfile.c packfile.c
> > index e69012e7f2..52b7b54aeb 100644
> > --- packfile.c
> > +++ packfile.c
> > @@ -1812,7 +1812,9 @@ void *unpack_entry(struct repository *r, struct
> > packed_git *p, off_t obj_offset,
> >                 if (!base)
> >                         continue;
> >
> > +               obj_read_lock();
> >                 delta_data = unpack_compressed_entry(p, &w_curs,
> > curpos, delta_size);
> > +               obj_read_unlock();
> >
> >                 if (!delta_data) {
> >                         error("failed to unpack compressed delta "
> >
> > ====
>
> Hmm, obj_read_mutex is a recursive lock, so by adding the
> obj_read_lock() call here, we are incrementing the lock value to [at
> least] 2 (because oid_object_info_extended() had already incremented
> once). Therefore, when unpack_compressed_entry() calls obj_read_unlock()
> before inflating the entry, the lock is not actually released, only
> decremented. So the effect of this change is that the phase III of
> unpack_entry() is performed entirely without releasing the lock.

Ah, there's the piece I was missing.  I didn't realize we temporarily
drop the lock inside unpack_compressed_entry.  And obviously I also
didn't notice that we were already holding a lock before entering
unpack_entry.  Thanks for the detailed explanation.

> Your crash log shows us that the segfault happened when trying to
> memcpy() the string `base` (in unpack_entry()). I.e., the same string
> that we had previously added to the delta base cache, right before
> calling unpack_compressed_entry() and releasing the lock. The
> problematic sequence is:
>         add `base` to the cache -> release lock -> inflate ->
>         acquire lock -> try to use `base`
>
> Maybe, when a thread X releases the lock to perform decompression,
> thread Y acquires the lock and tries to add another base to the cache.
> The cache is full, so Y has to remove entries, and it ends up free()'ing
> the base that was just inserted by X. Later, X tries to dereference
> `base` in patch_entry(), which would cause a segfault. It would also
> explain why your change solves the segfault.
>
> I'm not sure, though, why this entry would be removed, since it was just
> added... Maybe Y was adding a huge base to the cache, which required
> removing all entries?

I think your theory is correct here on both counts.  The repo I am
searching has some sinfully large objects, and when there is a cache
limit, they are likely to exceed it.

> Anyways, you mentioned you can reproduce the failure quite reliably in your
> repo with 20 threads, right? Could you please check with the following patch
> applied? It adds a `base` copy to the cache instead of the original string,
> allowing us to keep running decompression in parallel but without the risk of
> having another thread free()'ing `base` in the mean time.


Yes, your patch reliably solves the problem, too.  The performance is
only slightly improved over holding the lock during inflate in my
case, but it does save a bit of time.  Surprisingly, performance seems
to peak between 5 and 10 threads for me in both cases.  It reliably
gets slightly worse at 20 threads and 40 threads.

In all cases when threads > 2, this search appears to be 10%-20%
slower than running without this change.  Even when threads=1, though,
the result is about 10% slower.  Perhaps it is worth avoiding the
base_dup when !!obj_read_use_lock.

> diff --git a/packfile.c b/packfile.c
> index 9ef27508f2..79f83b6034 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -1772,14 +1772,15 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
>         while (delta_stack_nr) {
>                 void *delta_data;
>                 void *base = data;
> -               void *external_base = NULL;
>                 unsigned long delta_size, base_size = size;
>                 int i;
>
>                 data = NULL;
>
> -               if (base)
> -                       add_delta_base_cache(p, obj_offset, base, base_size, type);
> +               if (base) {
> +                       char *base_dup = memcpy(xmalloc(base_size), base, base_size);
> +                       add_delta_base_cache(p, obj_offset, base_dup, base_size, type);
> +               }
>
>                 if (!base) {
>                         /*
> @@ -1799,7 +1800,6 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
>                                       p->pack_name);
>                                 mark_bad_packed_object(p, base_oid.hash);
>                                 base = read_object(r, &base_oid, &type, &base_size);
> -                               external_base = base;
>                         }
>                 }
>
> @@ -1818,7 +1818,7 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
>                               "at offset %"PRIuMAX" from %s",
>                               (uintmax_t)curpos, p->pack_name);
>                         data = NULL;
> -                       free(external_base);
> +                       free(base);
>                         continue;
>                 }
>
> @@ -1838,7 +1838,7 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
>                         error("failed to apply delta");
>
>                 free(delta_data);
> -               free(external_base);
> +               free(base);
>         }
>
>         if (final_type)
> --
diff mbox series

Patch

diff --git packfile.c packfile.c
index e69012e7f2..52b7b54aeb 100644
--- packfile.c
+++ packfile.c
@@ -1812,7 +1812,9 @@  void *unpack_entry(struct repository *r, struct
packed_git *p, off_t obj_offset,
                if (!base)
                        continue;

+               obj_read_lock();
                delta_data = unpack_compressed_entry(p, &w_curs,
curpos, delta_size);
+               obj_read_unlock();