diff mbox series

[10/15] name-rev: restructure creating/updating 'struct rev_name' instances

Message ID 20190919214712.7348-11-szeder.dev@gmail.com (mailing list archive)
State New, archived
Headers show
Series name-rev: eliminate recursion | expand

Commit Message

SZEDER Gábor Sept. 19, 2019, 9:47 p.m. UTC
At the beginning of the recursive name_rev() function it creates a new
'struct rev_name' instance for each previously unvisited commit or, if
this visit results in better name for an already visited commit, then
updates the 'struct rev_name' instance attached to to the commit, or
returns early.

Restructure this so it's caller creates or updates the 'struct
rev_name' instance associated with the commit to be passed as
parameter, i.e. both name_ref() before calling name_rev() and
name_rev() itself as it iterates over the parent commits.

This makes eliminating the recursion a bit easier to follow, and it
will be moved back to name_rev() after the recursion is eliminated.

This change also plugs the memory leak that was temporarily unplugged
in the earlier "name-rev: pull out deref handling from the recursion"
patch in this series.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/name-rev.c | 35 +++++++++++++++++++++--------------
 1 file changed, 21 insertions(+), 14 deletions(-)

Comments

Derrick Stolee Sept. 20, 2019, 3:27 p.m. UTC | #1
On 9/19/2019 5:47 PM, SZEDER Gábor wrote:
> At the beginning of the recursive name_rev() function it creates a new
> 'struct rev_name' instance for each previously unvisited commit or, if
> this visit results in better name for an already visited commit, then
> updates the 'struct rev_name' instance attached to to the commit, or
> returns early.
> 
> Restructure this so it's caller creates or updates the 'struct
> rev_name' instance associated with the commit to be passed as
> parameter, i.e. both name_ref() before calling name_rev() and
> name_rev() itself as it iterates over the parent commits.
> 
> This makes eliminating the recursion a bit easier to follow, and it
> will be moved back to name_rev() after the recursion is eliminated.
> 
> This change also plugs the memory leak that was temporarily unplugged
> in the earlier "name-rev: pull out deref handling from the recursion"
> patch in this series.
[snip]
>  
> @@ -276,11 +277,17 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
>  		path = name_ref_abbrev(path, can_abbreviate_output);
>  		if (commit->date >= cutoff) {
>  			const char *tip_name;
> +			char *to_free = NULL;
>  			if (deref)
> -				tip_name = xstrfmt("%s^0", path);
> +				tip_name = to_free = xstrfmt("%s^0", path);
>  			else
>  				tip_name = xstrdup(path);

So this xstrdup(path) is not a leak?

> -			name_rev(commit, tip_name, taggerdate, 0, 0, from_tag);
> +			if (create_or_update_name(commit, tip_name, taggerdate,
> +						  0, 0, from_tag))
> +				name_rev(commit, tip_name, taggerdate, 0, 0,
> +					 from_tag);
> +			else
> +				free(to_free);
>  		}
>  	}
>  	return 0;
>
SZEDER Gábor Sept. 20, 2019, 5:09 p.m. UTC | #2
On Fri, Sep 20, 2019 at 11:27:49AM -0400, Derrick Stolee wrote:
> On 9/19/2019 5:47 PM, SZEDER Gábor wrote:
> > At the beginning of the recursive name_rev() function it creates a new
> > 'struct rev_name' instance for each previously unvisited commit or, if
> > this visit results in better name for an already visited commit, then
> > updates the 'struct rev_name' instance attached to to the commit, or
> > returns early.
> > 
> > Restructure this so it's caller creates or updates the 'struct
> > rev_name' instance associated with the commit to be passed as
> > parameter, i.e. both name_ref() before calling name_rev() and
> > name_rev() itself as it iterates over the parent commits.
> > 
> > This makes eliminating the recursion a bit easier to follow, and it
> > will be moved back to name_rev() after the recursion is eliminated.
> > 
> > This change also plugs the memory leak that was temporarily unplugged
> > in the earlier "name-rev: pull out deref handling from the recursion"
> > patch in this series.
> [snip]
> >  
> > @@ -276,11 +277,17 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
> >  		path = name_ref_abbrev(path, can_abbreviate_output);
> >  		if (commit->date >= cutoff) {
> >  			const char *tip_name;
> > +			char *to_free = NULL;
> >  			if (deref)
> > -				tip_name = xstrfmt("%s^0", path);
> > +				tip_name = to_free = xstrfmt("%s^0", path);
> >  			else
> >  				tip_name = xstrdup(path);
> 
> So this xstrdup(path) is not a leak?

Well... yes, everything is leaked, eventually ;)

First of all, name_ref() is a callback function invoked by
for_each_ref(), and 'path' is one of its parameters.  This means
that we must duplicate this string, because we might need to access it
even after iterating over refs is over (when displaying the name of
the commits).

This copy then becomes the initial name_rev() invocation's 'tip_name'
parameter, and things get a bit harder to reason about because of
merges, the recursion, and being in the middle of the refactorings...

So for argument's sake look at current master's 'builtin/name-rev.c',
and assume that it has to deal with a linear history with a single
branch, and no tags.  When name_ref() is invoked with a branch, then
deref = 0, so in name_rev() the condition:

  if (deref) {
        tip_name = to_free = xstrfmt("%s^0", tip_name);

won't be fulfilled, and 'tip_name' remains unchanged.  Then comes the
initialization of the 'struct rev_name' associated with the tip commit
in the commit slab, including the

  name->tip_name = tip_name;

assignment, IOW we now have a pointer to the original 'tip_name' in
the commit slab.

Then name_rev() looks at the tip commit's parents, or rather, because
of the linear history, at its only parent.  This means that neither of
the other two xstrfmt() will be invoked, and name_rev() will be
recursively invoked with the original 'tip_name' pointer.  This will
then initialize another 'struct rev_name' instance, including yet
another pointer to the original 'tip_name'.

This will go on until the root commit is reached, and in the end every
commit will have an associated 'struct rev_name' instance with a
pointer to the original 'tip_name' string.

At this point we're done with the recursion, and since the repo has
only a single branch, we're done with for_each_ref() as well, and
finally print names of the commits.

Then it's time to clean up and free() memory, but:

  - we could only release the commit slab and free() all 'struct
    rev_name' instances within, but we can't free() the
    'name->tip_name' pointers, because, as shown above, we might have
    a bunch of pointers pointing to the same string.

  - we are at the end of cmd_name_rev() and are about to exit the
    program, so the OS will release any and all resources anyway.
    Yeah, it's far from ready to be turned into a library call...

Now, in a real repository we'll have multiple branches, tags, and
merges, so there will be cases when an existing 'struct rev_name'
instance is updated, and it's 'name->tip_name' is overwritten by a
better name.  At that point the old value is potentially leaked, but
we can't really do anything about it, because we don't know whether
any other 'struct rev_name' instances point to it.

Eliminating the recursion doesn't change anything in this respect.
diff mbox series

Patch

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 99643aa4dc..98a549fef7 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -107,14 +107,12 @@  static void name_rev(struct commit *commit,
 	struct commit_list *parents;
 	int parent_number = 1;
 
-	if (!create_or_update_name(commit, tip_name, taggerdate, generation,
-				   distance, from_tag))
-		return;
-
 	for (parents = commit->parents;
 			parents;
 			parents = parents->next, parent_number++) {
 		struct commit *parent = parents->item;
+		const char *new_name;
+		int new_generation, new_distance;
 
 		parse_commit(parent);
 		if (parent->date < cutoff)
@@ -122,7 +120,6 @@  static void name_rev(struct commit *commit,
 
 		if (parent_number > 1) {
 			size_t len;
-			char *new_name;
 
 			strip_suffix(tip_name, "^0", &len);
 			if (generation > 0)
@@ -131,15 +128,19 @@  static void name_rev(struct commit *commit,
 			else
 				new_name = xstrfmt("%.*s^%d", (int)len, tip_name,
 						   parent_number);
-
-			name_rev(parent, new_name, taggerdate, 0,
-				 distance + MERGE_TRAVERSAL_WEIGHT,
-				 from_tag);
+			new_generation = 0;
+			new_distance = distance + MERGE_TRAVERSAL_WEIGHT;
 		} else {
-			name_rev(parent, tip_name, taggerdate,
-				 generation + 1, distance + 1,
-				 from_tag);
+			new_name = tip_name;
+			new_generation = generation + 1;
+			new_distance = distance + 1;
 		}
+
+		if (create_or_update_name(parent, new_name, taggerdate,
+					  new_generation, new_distance,
+					  from_tag))
+			name_rev(parent, new_name, taggerdate,
+				 new_generation, new_distance, from_tag);
 	}
 }
 
@@ -276,11 +277,17 @@  static int name_ref(const char *path, const struct object_id *oid, int flags, vo
 		path = name_ref_abbrev(path, can_abbreviate_output);
 		if (commit->date >= cutoff) {
 			const char *tip_name;
+			char *to_free = NULL;
 			if (deref)
-				tip_name = xstrfmt("%s^0", path);
+				tip_name = to_free = xstrfmt("%s^0", path);
 			else
 				tip_name = xstrdup(path);
-			name_rev(commit, tip_name, taggerdate, 0, 0, from_tag);
+			if (create_or_update_name(commit, tip_name, taggerdate,
+						  0, 0, from_tag))
+				name_rev(commit, tip_name, taggerdate, 0, 0,
+					 from_tag);
+			else
+				free(to_free);
 		}
 	}
 	return 0;