diff mbox series

[1/2] packfile: fix race condition on unpack_entry()

Message ID 42a7948f94cb57ebd9c37c3850b46b1ac9813ec6.1601311803.git.matheus.bernardino@usp.br (mailing list archive)
State New, archived
Headers show
Series Fix race condition and memory leak in delta base cache | expand

Commit Message

Matheus Tavares Sept. 28, 2020, 4:50 p.m. UTC
The third phase of unpack_entry() performs the following sequence in a
loop, until all the deltas enumerated in phase one are applied and the
entry is fully reconstructed:

1. Add the current base entry to the delta base cache
2. Unpack the next delta
3. Patch the unpacked delta on top of the base

When the optional object reading lock is enabled, the above steps will
be performed while holding the lock. However, step 2. momentarily
releases it so that inflation can be performed in parallel for increased
performance. Because the `base` buffer inserted in the cache at 1. is
not duplicated, another thread can potentially free() it while the lock
is released at 2. (e.g. when there is no space left in the cache to
insert another entry). In this case, the later attempt to dereference
`base` at 3. will cause a segmentation fault. This problem was observed
during a multithreaded git-grep execution on a repository with large
objects.

To fix the race condition (and later segmentation fault), let's reorder
the aforementioned steps so that the lock is not released between 1.
and 3. An alternative solution which would not require the reordering
would be to duplicate `base` before inserting it in the cache.
However, as Phil Hord mentioned, memcpy()'ing large bases can negatively
affect performance: in his experiments, this alternative approach slowed
git-grep down by 10% to 20%.

Reported-by: Phil Hord <phil.hord@gmail.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 packfile.c | 41 ++++++++++++++++++++++++-----------------
 1 file changed, 24 insertions(+), 17 deletions(-)

Comments

Junio C Hamano Sept. 28, 2020, 6:05 p.m. UTC | #1
Matheus Tavares <matheus.bernardino@usp.br> writes:

> To fix the race condition (and later segmentation fault), let's reorder
> the aforementioned steps so that the lock is not released between 1.
> and 3.

In other words, we hold the base in core only for ourselves without
adding it to the base cache, apply the delta to produce the result
and then place the base in the cache, and the reason why this change
fixes the breakage is because the base we have locally and not in
cache will not be seen by other people and will not be freed without
our consent?  Which does make sense.

But I was confused by the explanation "lock is not released".  We do
release the same lock when we call unpack_compressed_entry(), and
reaquire it before the unpack_compressed_entry() returns.  What the
reordering achieves is to protect the base from getting evicted when
the unlocking and relocing happens, no?

Thanks.
diff mbox series

Patch

diff --git a/packfile.c b/packfile.c
index 9ef27508f2..0319418d88 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1775,12 +1775,10 @@  void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
 		void *external_base = NULL;
 		unsigned long delta_size, base_size = size;
 		int i;
+		off_t base_obj_offset = obj_offset;
 
 		data = NULL;
 
-		if (base)
-			add_delta_base_cache(p, obj_offset, base, base_size, type);
-
 		if (!base) {
 			/*
 			 * We're probably in deep shit, but let's try to fetch
@@ -1818,24 +1816,33 @@  void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
 			      "at offset %"PRIuMAX" from %s",
 			      (uintmax_t)curpos, p->pack_name);
 			data = NULL;
-			free(external_base);
-			continue;
-		}
+		} else {
+			data = patch_delta(base, base_size, delta_data,
+					   delta_size, &size);
 
-		data = patch_delta(base, base_size,
-				   delta_data, delta_size,
-				   &size);
+			/*
+			 * We could not apply the delta; warn the user, but
+			 * keep going. Our failure will be noticed either in
+			 * the next iteration of the loop, or if this is the
+			 * final delta, in the caller when we return NULL.
+			 * Those code paths will take care of making a more
+			 * explicit warning and retrying with another copy of
+			 * the object.
+			 */
+			if (!data)
+				error("failed to apply delta");
+		}
 
 		/*
-		 * We could not apply the delta; warn the user, but keep going.
-		 * Our failure will be noticed either in the next iteration of
-		 * the loop, or if this is the final delta, in the caller when
-		 * we return NULL. Those code paths will take care of making
-		 * a more explicit warning and retrying with another copy of
-		 * the object.
+		 * We delay adding `base` to the cache until the end of the loop
+		 * because unpack_compressed_entry() momentarily releases the
+		 * obj_read_mutex, giving another thread the chance to access
+		 * the cache. Therefore, if `base` was already there, this other
+		 * thread could free() it (e.g. to make space for another entry)
+		 * before we are done using it.
 		 */
-		if (!data)
-			error("failed to apply delta");
+		if (!external_base)
+			add_delta_base_cache(p, base_obj_offset, base, base_size, type);
 
 		free(delta_data);
 		free(external_base);