diff mbox series

[RFC,3/4] name-rev: check if a commit should be named before naming it

Message ID 20190301175024.17337-4-alban.gruin@gmail.com (mailing list archive)
State New, archived
Headers show
Series name-rev: improve memory usage | expand

Commit Message

Alban Gruin March 1, 2019, 5:50 p.m. UTC
Until now, name_rev() named every commit it found, even it the name
would ultimately be unused.

This makes name_rev() take a commit_list and check if the commit it wants
to name is part of the list.  If it is, the commit is named, and
name_rev() signals to its caller that the name should not be freed.  If
it is not, the commit is left unnamed.  In this case, the name can still
be used by the first descendant of this commit (or one of its descendants).

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---
 builtin/name-rev.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

Comments

Eric Sunshine March 1, 2019, 6:05 p.m. UTC | #1
On Fri, Mar 1, 2019 at 12:50 PM Alban Gruin <alban.gruin@gmail.com> wrote:
> Until now, name_rev() named every commit it found, even it the name

s/even it/even if/

> would ultimately be unused.
> [...]
> Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
> ---
> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
> @@ -107,12 +107,18 @@ static int name_rev(struct commit *commit,
>  copy_data:
> -               name->tip_name = tip_name;
> +               if (commit_list_contains(commits, commit) ||
> +                   commit_list_count(commits) == 0) {

Minor: It would probably be more efficient to check if the count is 0
first, and only search the list if not.

By the way, how big is 'commits' likely to get? Will the linear scan
done by commit_list_contains() become an issue? Should it be using a
hash table instead?

> +                       name->tip_name = tip_name;
> +                       free_alloc = 0;
> +               } else {
> +                       name->tip_name = NULL;
> +               }
Alban Gruin March 1, 2019, 6:22 p.m. UTC | #2
Hi Eric,

Le 01/03/2019 à 19:05, Eric Sunshine a écrit :
> On Fri, Mar 1, 2019 at 12:50 PM Alban Gruin <alban.gruin@gmail.com> wrote:
> -%<-
> Minor: It would probably be more efficient to check if the count is 0
> first, and only search the list if not.
> 
> By the way, how big is 'commits' likely to get? Will the linear scan
> done by commit_list_contains() become an issue? Should it be using a
> hash table instead?
> 

It depends on the amount of commits mentionned in stdin that are
reachable from the ref in name_ref().  If there is a lot of commit in
the input, it may effectively become a problem.

I thought of adding a field to the rev_name structure for this purpose.
 I think commit slabs are hash maps under the hood?

>> +                       name->tip_name = tip_name;
>> +                       free_alloc = 0;
>> +               } else {
>> +                       name->tip_name = NULL;
>> +               }

Cheers,
Alban
Jeff King March 1, 2019, 6:37 p.m. UTC | #3
On Fri, Mar 01, 2019 at 07:22:47PM +0100, Alban Gruin wrote:

> Le 01/03/2019 à 19:05, Eric Sunshine a écrit :
> > On Fri, Mar 1, 2019 at 12:50 PM Alban Gruin <alban.gruin@gmail.com> wrote:
> > -%<-
> > Minor: It would probably be more efficient to check if the count is 0
> > first, and only search the list if not.
> > 
> > By the way, how big is 'commits' likely to get? Will the linear scan
> > done by commit_list_contains() become an issue? Should it be using a
> > hash table instead?
> > 
> 
> It depends on the amount of commits mentionned in stdin that are
> reachable from the ref in name_ref().  If there is a lot of commit in
> the input, it may effectively become a problem.

Yeah, I think this is quadratic in the worst case.

> I thought of adding a field to the rev_name structure for this purpose.
>  I think commit slabs are hash maps under the hood?

They're not hash maps, but they're still O(1). Each commit struct is
allocated a unique integer id, and the slab just keeps a dynamic array
of one entry per commit.

So they're actually faster than a hashmap (it's a pure index, no oideq()
required).  The downside is that they can use more memory than a hashmap
if they're sparsely populated. However in your case, I think you'd load
a couple of "struct commit" objects at the start, mark them as "1" in
the slab, and then never mark any more as you traverse. So you'd only
allocate entries for those tightly-packed initial ones.

It seems like an ideal case for using a slab.

-Peff
diff mbox series

Patch

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 0719a9388d..2f89ed50a1 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -80,7 +80,7 @@  static int is_better_name(struct rev_name *name,
 static int name_rev(struct commit *commit,
 		const char *tip_name, timestamp_t taggerdate,
 		int generation, int distance, int from_tag,
-		int deref)
+		int deref, struct commit_list *commits)
 {
 	struct rev_name *name = get_commit_rev_name(commit);
 	struct commit_list *parents;
@@ -107,12 +107,18 @@  static int name_rev(struct commit *commit,
 	} else if (is_better_name(name, tip_name, taggerdate,
 				  generation, distance, from_tag)) {
 copy_data:
-		name->tip_name = tip_name;
+		if (commit_list_contains(commits, commit) ||
+		    commit_list_count(commits) == 0) {
+			name->tip_name = tip_name;
+			free_alloc = 0;
+		} else {
+			name->tip_name = NULL;
+		}
+
 		name->taggerdate = taggerdate;
 		name->generation = generation;
 		name->distance = distance;
 		name->from_tag = from_tag;
-		free_alloc = 0;
 	} else {
 		free(to_free);
 		return 1;
@@ -135,12 +141,12 @@  static int name_rev(struct commit *commit,
 
 			if (name_rev(parents->item, new_name, taggerdate, 0,
 				      distance + MERGE_TRAVERSAL_WEIGHT,
-				      from_tag, 0))
+				      from_tag, 0, commits))
 				free(new_name);
 		} else {
 			free_alloc &= name_rev(parents->item, tip_name, taggerdate,
 					       generation + 1, distance + 1,
-					       from_tag, 0);
+					       from_tag, 0, commits);
 		}
 	}
 
@@ -279,7 +285,7 @@  static int name_ref(const char *path, const struct object_id *oid, int flags, vo
 			taggerdate = ((struct commit *)o)->date;
 		path = name_ref_abbrev(path, can_abbreviate_output);
 		name_rev(commit, xstrdup(path), taggerdate, 0, 0,
-			 from_tag, deref);
+			 from_tag, deref, NULL);
 	}
 	return 0;
 }