diff mbox series

[2/2] packfile: fix memory leak in add_delta_base_cache()

Message ID 5b6e3019e08c6bccdee29018e99b0c6933fe05e0.1601311803.git.matheus.bernardino@usp.br (mailing list archive)
State New, archived
Headers show
Series Fix race condition and memory leak in delta base cache | expand

Commit Message

Matheus Tavares Sept. 28, 2020, 4:50 p.m. UTC
When add_delta_base_cache() is called with a base that is already in the
cache, no operation is performed. But the check is done after allocating
space for a new entry, so we end up leaking memory on the early return.
Also, the caller always expect that the base will be inserted, so it
never free()'s it. To fix both of these memory leaks, let's move the
allocation of a new entry further down in add_delta_base_cache(), and
make the function return an integer to indicate whether the insertion
was performed or not. Then, make the caller free() the base when needed.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 packfile.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

Comments

Junio C Hamano Sept. 28, 2020, 6:22 p.m. UTC | #1
Matheus Tavares <matheus.bernardino@usp.br> writes:

> When add_delta_base_cache() is called with a base that is already in the
> cache, no operation is performed. But the check is done after allocating
> space for a new entry, so we end up leaking memory on the early return.

Wow, that's so obvious a leak that it is surprising it has been
unnoticed, especially given that the runtime inflation of the
packfile was written so long time ago and was a central part of the
system.

I had to dig and find out that the breakage was fairly recent from
early this year, made in 31877c9a (object-store: allow threaded
access to object reading, 2020-01-15).

> Also, the caller always expect that the base will be inserted, so it
> never free()'s it. To fix both of these memory leaks, let's move the
> allocation of a new entry further down in add_delta_base_cache(), and
> make the function return an integer to indicate whether the insertion
> was performed or not. Then, make the caller free() the base when needed.
>
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>

> @@ -1841,8 +1843,10 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
>  		 * thread could free() it (e.g. to make space for another entry)
>  		 * before we are done using it.
>  		 */
> -		if (!external_base)
> -			add_delta_base_cache(p, base_obj_offset, base, base_size, type);
> +		if (!external_base && !add_delta_base_cache(p, base_obj_offset,
> +						base, base_size, type)) {
> +			free(base);
> +		}

When you have to wrap a long expression, try to split after an
operator near the root of the parse tree, e.g.

		if (!external_base &&
		    !add_delta_base_cache(p, base_obj_offset, base, base_size, type)) {

would make the result easier to follow.

I however suspect that it may be better let add_delta_base_cache()
do the freeing.  There is only one caller, and from its point of
view, the timing when it throws the base at the cache (after the
previous patch) is when it is done with it.

In other words we can think of the call to add_delta_base_cache() as
the caller saying: "I am done with this, but somebody else might
want to reuse it later, so do whatever you want to do with it".  

If we were to go that route, it might even make sense to rename it
to reflect that mentality from the viewpoint of the caller, but a
single-caller helper like this one it may not matter all that much.

Thanks.
diff mbox series

Patch

diff --git a/packfile.c b/packfile.c
index 0319418d88..177793e01a 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1471,10 +1471,10 @@  void clear_delta_base_cache(void)
 	}
 }
 
-static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
+static int add_delta_base_cache(struct packed_git *p, off_t base_offset,
 	void *base, unsigned long base_size, enum object_type type)
 {
-	struct delta_base_cache_entry *ent = xmalloc(sizeof(*ent));
+	struct delta_base_cache_entry *ent;
 	struct list_head *lru, *tmp;
 
 	/*
@@ -1483,7 +1483,7 @@  static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
 	 * and III might run concurrently across multiple threads).
 	 */
 	if (in_delta_base_cache(p, base_offset))
-		return;
+		return 0;
 
 	delta_base_cached += base_size;
 
@@ -1495,6 +1495,7 @@  static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
 		release_delta_base_cache(f);
 	}
 
+	ent = xmalloc(sizeof(*ent));
 	ent->key.p = p;
 	ent->key.base_offset = base_offset;
 	ent->type = type;
@@ -1506,6 +1507,7 @@  static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
 		hashmap_init(&delta_base_cache, delta_base_cache_hash_cmp, NULL, 0);
 	hashmap_entry_init(&ent->ent, pack_entry_hash(p, base_offset));
 	hashmap_add(&delta_base_cache, &ent->ent);
+	return 1;
 }
 
 int packed_object_info(struct repository *r, struct packed_git *p,
@@ -1841,8 +1843,10 @@  void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
 		 * thread could free() it (e.g. to make space for another entry)
 		 * before we are done using it.
 		 */
-		if (!external_base)
-			add_delta_base_cache(p, base_obj_offset, base, base_size, type);
+		if (!external_base && !add_delta_base_cache(p, base_obj_offset,
+						base, base_size, type)) {
+			free(base);
+		}
 
 		free(delta_data);
 		free(external_base);