Message ID | 20190301175024.17337-4-alban.gruin@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | name-rev: improve memory usage | expand |
On Fri, Mar 1, 2019 at 12:50 PM Alban Gruin <alban.gruin@gmail.com> wrote: > Until now, name_rev() named every commit it found, even it the name s/even it/even if/ > would ultimately be unused. > [...] > Signed-off-by: Alban Gruin <alban.gruin@gmail.com> > --- > diff --git a/builtin/name-rev.c b/builtin/name-rev.c > @@ -107,12 +107,18 @@ static int name_rev(struct commit *commit, > copy_data: > - name->tip_name = tip_name; > + if (commit_list_contains(commits, commit) || > + commit_list_count(commits) == 0) { Minor: It would probably be more efficient to check if the count is 0 first, and only search the list if not. By the way, how big is 'commits' likely to get? Will the linear scan done by commit_list_contains() become an issue? Should it be using a hash table instead? > + name->tip_name = tip_name; > + free_alloc = 0; > + } else { > + name->tip_name = NULL; > + }
Hi Eric, Le 01/03/2019 à 19:05, Eric Sunshine a écrit : > On Fri, Mar 1, 2019 at 12:50 PM Alban Gruin <alban.gruin@gmail.com> wrote: > -%<- > Minor: It would probably be more efficient to check if the count is 0 > first, and only search the list if not. > > By the way, how big is 'commits' likely to get? Will the linear scan > done by commit_list_contains() become an issue? Should it be using a > hash table instead? > It depends on the amount of commits mentionned in stdin that are reachable from the ref in name_ref(). If there is a lot of commit in the input, it may effectively become a problem. I thought of adding a field to the rev_name structure for this purpose. I think commit slabs are hash maps under the hood? >> + name->tip_name = tip_name; >> + free_alloc = 0; >> + } else { >> + name->tip_name = NULL; >> + } Cheers, Alban
On Fri, Mar 01, 2019 at 07:22:47PM +0100, Alban Gruin wrote: > Le 01/03/2019 à 19:05, Eric Sunshine a écrit : > > On Fri, Mar 1, 2019 at 12:50 PM Alban Gruin <alban.gruin@gmail.com> wrote: > > -%<- > > Minor: It would probably be more efficient to check if the count is 0 > > first, and only search the list if not. > > > > By the way, how big is 'commits' likely to get? Will the linear scan > > done by commit_list_contains() become an issue? Should it be using a > > hash table instead? > > > > It depends on the amount of commits mentionned in stdin that are > reachable from the ref in name_ref(). If there is a lot of commit in > the input, it may effectively become a problem. Yeah, I think this is quadratic in the worst case. > I thought of adding a field to the rev_name structure for this purpose. > I think commit slabs are hash maps under the hood? They're not hash maps, but they're still O(1). Each commit struct is allocated a unique integer id, and the slab just keeps a dynamic array of one entry per commit. So they're actually faster than a hashmap (it's a pure index, no oideq() required). The downside is that they can use more memory than a hashmap if they're sparsely populated. However in your case, I think you'd load a couple of "struct commit" objects at the start, mark them as "1" in the slab, and then never mark any more as you traverse. So you'd only allocate entries for those tightly-packed initial ones. It seems like an ideal case for using a slab. -Peff
diff --git a/builtin/name-rev.c b/builtin/name-rev.c index 0719a9388d..2f89ed50a1 100644 --- a/builtin/name-rev.c +++ b/builtin/name-rev.c @@ -80,7 +80,7 @@ static int is_better_name(struct rev_name *name, static int name_rev(struct commit *commit, const char *tip_name, timestamp_t taggerdate, int generation, int distance, int from_tag, - int deref) + int deref, struct commit_list *commits) { struct rev_name *name = get_commit_rev_name(commit); struct commit_list *parents; @@ -107,12 +107,18 @@ static int name_rev(struct commit *commit, } else if (is_better_name(name, tip_name, taggerdate, generation, distance, from_tag)) { copy_data: - name->tip_name = tip_name; + if (commit_list_contains(commits, commit) || + commit_list_count(commits) == 0) { + name->tip_name = tip_name; + free_alloc = 0; + } else { + name->tip_name = NULL; + } + name->taggerdate = taggerdate; name->generation = generation; name->distance = distance; name->from_tag = from_tag; - free_alloc = 0; } else { free(to_free); return 1; @@ -135,12 +141,12 @@ static int name_rev(struct commit *commit, if (name_rev(parents->item, new_name, taggerdate, 0, distance + MERGE_TRAVERSAL_WEIGHT, - from_tag, 0)) + from_tag, 0, commits)) free(new_name); } else { free_alloc &= name_rev(parents->item, tip_name, taggerdate, generation + 1, distance + 1, - from_tag, 0); + from_tag, 0, commits); } } @@ -279,7 +285,7 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo taggerdate = ((struct commit *)o)->date; path = name_ref_abbrev(path, can_abbreviate_output); name_rev(commit, xstrdup(path), taggerdate, 0, 0, - from_tag, deref); + from_tag, deref, NULL); } return 0; }
Until now, name_rev() named every commit it found, even it the name would ultimately be unused. This makes name_rev() take a commit_list and check if the commit it wants to name is part of the list. If it is, the commit is named, and name_rev() signals to its caller that the name should not be freed. If it is not, the commit is left unnamed. In this case, the name can still be used by the first descendant of this commit (or one of its descendants). Signed-off-by: Alban Gruin <alban.gruin@gmail.com> --- builtin/name-rev.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-)