Message ID | 20190301175024.17337-5-alban.gruin@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | name-rev: improve memory usage | expand |
On Fri, Mar 1, 2019 at 12:50 PM Alban Gruin <alban.gruin@gmail.com> wrote: > A ref may not be the descendant of all of the commits mentionned in > stdin. In this case, we want to avoid naming its parents. s/mentionned/mentioned/ > > To do this, find_commits_in_strbuf() is created. It returns a raw list > of all the commits that have been found in the input buffer. ishex() is > converted to an inlined function. Then, we add a raw list of commits in > the name_ref_data structure, and call find_commits_in_strbuf() before > for_each_ref() if the user wants name-ref to process stdin. Then, for > each ref, we check if the reachable subset of this commit list is empty > or not. If it is, we do not call name_rev(), so we don’t name its > parents. > > The code dealing with stdin after calling for_each_ref() is no longer > needed as we already read it. name_rev_line() is renamed name_rev_buf() > to reflect its new role better. > > Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
On Fri, Mar 1, 2019 at 7:11 PM Alban Gruin <alban.gruin@gmail.com> wrote: > > A ref may not be the descendant of all of the commits mentionned in > stdin. In this case, we want to avoid naming its parents. Properly speaking a ref usually just points to a commit. It is not a parent or a descendant of any commit. Also if the commit, let's call it C, pointed to by a ref isn't a descendant of a commit, let's call it C', mentioned on stdin, then C' isn't a parent of C, though I think we want to avoid naming C' from the ref. I think it might be clearer with something like: "The commit, let's call it C, pointed to by a ref may not be a descendant of all the commits mentioned on stdin. In this case we want to avoid naming the commits mentioned on stdin that are not parents of C using the ref." Or is there something I misunderstood?
On Sun, Mar 3, 2019 at 8:33 PM Christian Couder <christian.couder@gmail.com> wrote: > > On Fri, Mar 1, 2019 at 7:11 PM Alban Gruin <alban.gruin@gmail.com> wrote: > > > > A ref may not be the descendant of all of the commits mentionned in > > stdin. In this case, we want to avoid naming its parents. > > Properly speaking a ref usually just points to a commit. It is not a > parent or a descendant of any commit. > > Also if the commit, let's call it C, pointed to by a ref isn't a > descendant of a commit, let's call it C', mentioned on stdin, then C' > isn't a parent of C, though I think we want to avoid naming C' from > the ref. > > I think it might be clearer with something like: > > "The commit, let's call it C, pointed to by a ref may not be a > descendant of all the commits mentioned on stdin. In this case we want > to avoid naming the commits mentioned on stdin that are not parents of > C using the ref." By the way things might even be clearer by replacing "parent" with "ancestor" in my message.
Hi Christian, Le 03/03/2019 à 20:33, Christian Couder a écrit : > On Fri, Mar 1, 2019 at 7:11 PM Alban Gruin <alban.gruin@gmail.com> wrote: >> >> A ref may not be the descendant of all of the commits mentionned in >> stdin. In this case, we want to avoid naming its parents. > > Properly speaking a ref usually just points to a commit. It is not a > parent or a descendant of any commit. > > Also if the commit, let's call it C, pointed to by a ref isn't a > descendant of a commit, let's call it C', mentioned on stdin, then C' > isn't a parent of C, though I think we want to avoid naming C' from > the ref. > > I think it might be clearer with something like: > > "The commit, let's call it C, pointed to by a ref may not be a > descendant of all the commits mentioned on stdin. In this case we want > to avoid naming the commits mentioned on stdin that are not parents of > C using the ref." > > Or is there something I misunderstood? > It’s almost this. We do want to name *all* of the commits mentionned in stdin, as far as possible. We also want to lower the amount of memory allocation, and one of the possibilities is to avoid traversing the tree from a commit pointed to by a ref if none of its parents are mentionned in stdin. Cheers, Alban
diff --git a/builtin/name-rev.c b/builtin/name-rev.c index 2f89ed50a1..f5860f5625 100644 --- a/builtin/name-rev.c +++ b/builtin/name-rev.c @@ -8,6 +8,7 @@ #include "parse-options.h" #include "sha1-lookup.h" #include "commit-slab.h" +#include "commit-reach.h" #define CUTOFF_DATE_SLOP 86400 /* one day */ @@ -183,6 +184,8 @@ struct name_ref_data { int name_only; struct string_list ref_filters; struct string_list exclude_filters; + struct commit **commits; + int commits_nr; }; static struct tip_table { @@ -279,13 +282,21 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo } if (o && o->type == OBJ_COMMIT) { struct commit *commit = (struct commit *)o; + struct commit_list *reachable_commits = NULL; int from_tag = starts_with(path, "refs/tags/"); - if (taggerdate == TIME_MAX) - taggerdate = ((struct commit *)o)->date; - path = name_ref_abbrev(path, can_abbreviate_output); - name_rev(commit, xstrdup(path), taggerdate, 0, 0, - from_tag, deref, NULL); + reachable_commits = get_reachable_subset(&commit, 1, + data->commits, data->commits_nr, 0); + + if (commit_list_count(reachable_commits) > 0 || data->commits_nr == 0) { + if (taggerdate == TIME_MAX) + taggerdate = ((struct commit *)o)->date; + path = name_ref_abbrev(path, can_abbreviate_output); + name_rev(commit, xstrdup(path), taggerdate, 0, 0, + from_tag, deref, reachable_commits); + } + + free_commit_list(reachable_commits); } return 0; } @@ -369,13 +380,53 @@ static char const * const name_rev_usage[] = { NULL }; -static void name_rev_line(char *p, struct name_ref_data *data) +static inline int ishex(char p) +{ + return isdigit(p) || (p >= 'a' && p <= 'f'); +} + +static struct commit **find_commits_in_strbuf(struct strbuf *buf, int *count) +{ + int forty = 0; + char *p; + struct commit **commits = NULL; + + *count = 0; + + for (p = buf->buf; *p; p++) { + if (!ishex(*p)) + forty = 0; + else if (++forty == GIT_SHA1_HEXSZ && + !ishex(*(p+1))) { + struct object_id oid; + char c = *(p+1); + + *(p+1) = 0; + if (!get_oid(p - (GIT_SHA1_HEXSZ -1), &oid)) { + struct object *o = + parse_object(the_repository, &oid); + + if (o && o->type == OBJ_COMMIT) { + struct commit *c = (struct commit *) o; + + REALLOC_ARRAY(commits, (*count) + 1); + commits[(*count)++] = c; + set_commit_rev_name(c, NULL); + } + } + *(p+1) = c; + } + } + + return commits; +} + +static void name_rev_buf(char *p, struct name_ref_data *data) { struct strbuf buf = STRBUF_INIT; int forty = 0; char *p_start; for (p_start = p; *p; p++) { -#define ishex(x) (isdigit((x)) || ((x) >= 'a' && (x) <= 'f')) if (!ishex(*p)) forty = 0; else if (++forty == GIT_SHA1_HEXSZ && @@ -419,7 +470,7 @@ int cmd_name_rev(int argc, const char **argv, const char *prefix) { struct object_array revs = OBJECT_ARRAY_INIT; int all = 0, transform_stdin = 0, allow_undefined = 1, always = 0, peel_tag = 0; - struct name_ref_data data = { 0, 0, STRING_LIST_INIT_NODUP, STRING_LIST_INIT_NODUP }; + struct name_ref_data data = { 0, 0, STRING_LIST_INIT_NODUP, STRING_LIST_INIT_NODUP, NULL, 0 }; struct option opts[] = { OPT_BOOL(0, "name-only", &data.name_only, N_("print only names (no SHA-1)")), OPT_BOOL(0, "tags", &data.tags_only, N_("only use tags to name the commits")), @@ -496,20 +547,27 @@ int cmd_name_rev(int argc, const char **argv, const char *prefix) if (cutoff) cutoff = cutoff - CUTOFF_DATE_SLOP; - for_each_ref(name_ref, &data); if (transform_stdin) { - char buffer[2048]; + struct strbuf buf = STRBUF_INIT; - while (!feof(stdin)) { - char *p = fgets(buffer, sizeof(buffer), stdin); - if (!p) - break; - name_rev_line(p, &data); + strbuf_read(&buf, STDIN_FILENO, 0); + data.commits = find_commits_in_strbuf(&buf, &data.commits_nr); + + if (data.commits_nr > 0) { + for_each_ref(name_ref, &data); + name_rev_buf(buf.buf, &data); + } else { + fwrite(buf.buf, buf.len, 1, stdout); } + + free(data.commits); + strbuf_release(&buf); } else if (all) { int i, max; + for_each_ref(name_ref, &data); + max = get_max_object_index(); for (i = 0; i < max; i++) { struct object *obj = get_indexed_object(i); @@ -520,6 +578,9 @@ int cmd_name_rev(int argc, const char **argv, const char *prefix) } } else { int i; + + for_each_ref(name_ref, &data); + for (i = 0; i < revs.nr; i++) show_name(revs.objects[i].item, revs.objects[i].name, always, allow_undefined, data.name_only);
A ref may not be the descendant of all of the commits mentionned in stdin. In this case, we want to avoid naming its parents. To do this, find_commits_in_strbuf() is created. It returns a raw list of all the commits that have been found in the input buffer. ishex() is converted to an inlined function. Then, we add a raw list of commits in the name_ref_data structure, and call find_commits_in_strbuf() before for_each_ref() if the user wants name-ref to process stdin. Then, for each ref, we check if the reachable subset of this commit list is empty or not. If it is, we do not call name_rev(), so we don’t name its parents. The code dealing with stdin after calling for_each_ref() is no longer needed as we already read it. name_rev_line() is renamed name_rev_buf() to reflect its new role better. Signed-off-by: Alban Gruin <alban.gruin@gmail.com> --- builtin/name-rev.c | 91 ++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 76 insertions(+), 15 deletions(-)