[RFC,4/4] name-rev: avoid naming from a ref if it’s not a descendant of any commit
diff mbox series

Message ID 20190301175024.17337-5-alban.gruin@gmail.com
State New
Headers show
Series
  • name-rev: improve memory usage
Related show

Commit Message

Alban Gruin March 1, 2019, 5:50 p.m. UTC
A ref may not be the descendant of all of the commits mentionned in
stdin.  In this case, we want to avoid naming its parents.

To do this, find_commits_in_strbuf() is created.  It returns a raw list
of all the commits that have been found in the input buffer.  ishex() is
converted to an inlined function.  Then, we add a raw list of commits in
the name_ref_data structure, and call find_commits_in_strbuf() before
for_each_ref() if the user wants name-ref to process stdin.  Then, for
each ref, we check if the reachable subset of this commit list is empty
or not.  If it is, we do not call name_rev(), so we don’t name its
parents.

The code dealing with stdin after calling for_each_ref() is no longer
needed as we already read it.  name_rev_line() is renamed name_rev_buf()
to reflect its new role better.

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---
 builtin/name-rev.c | 91 ++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 76 insertions(+), 15 deletions(-)

Comments

Eric Sunshine March 1, 2019, 6:07 p.m. UTC | #1
On Fri, Mar 1, 2019 at 12:50 PM Alban Gruin <alban.gruin@gmail.com> wrote:
> A ref may not be the descendant of all of the commits mentionned in
> stdin.  In this case, we want to avoid naming its parents.

s/mentionned/mentioned/

>
> To do this, find_commits_in_strbuf() is created.  It returns a raw list
> of all the commits that have been found in the input buffer.  ishex() is
> converted to an inlined function.  Then, we add a raw list of commits in
> the name_ref_data structure, and call find_commits_in_strbuf() before
> for_each_ref() if the user wants name-ref to process stdin.  Then, for
> each ref, we check if the reachable subset of this commit list is empty
> or not.  If it is, we do not call name_rev(), so we don’t name its
> parents.
>
> The code dealing with stdin after calling for_each_ref() is no longer
> needed as we already read it.  name_rev_line() is renamed name_rev_buf()
> to reflect its new role better.
>
> Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
Christian Couder March 3, 2019, 7:33 p.m. UTC | #2
On Fri, Mar 1, 2019 at 7:11 PM Alban Gruin <alban.gruin@gmail.com> wrote:
>
> A ref may not be the descendant of all of the commits mentionned in
> stdin.  In this case, we want to avoid naming its parents.

Properly speaking a ref usually just points to a commit. It is not a
parent or a descendant of any commit.

Also if the commit, let's call it C, pointed to by a ref isn't a
descendant of a commit, let's call it C', mentioned on stdin, then C'
isn't a parent of C, though I think we want to avoid naming C' from
the ref.

I think it might be clearer with something like:

"The commit, let's call it C, pointed to by a ref may not be a
descendant of all the commits mentioned on stdin. In this case we want
to avoid naming the commits mentioned on stdin that are not parents of
C using the ref."

Or is there something I misunderstood?
Christian Couder March 3, 2019, 7:46 p.m. UTC | #3
On Sun, Mar 3, 2019 at 8:33 PM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Fri, Mar 1, 2019 at 7:11 PM Alban Gruin <alban.gruin@gmail.com> wrote:
> >
> > A ref may not be the descendant of all of the commits mentionned in
> > stdin.  In this case, we want to avoid naming its parents.
>
> Properly speaking a ref usually just points to a commit. It is not a
> parent or a descendant of any commit.
>
> Also if the commit, let's call it C, pointed to by a ref isn't a
> descendant of a commit, let's call it C', mentioned on stdin, then C'
> isn't a parent of C, though I think we want to avoid naming C' from
> the ref.
>
> I think it might be clearer with something like:
>
> "The commit, let's call it C, pointed to by a ref may not be a
> descendant of all the commits mentioned on stdin. In this case we want
> to avoid naming the commits mentioned on stdin that are not parents of
> C using the ref."

By the way things might even be clearer by replacing "parent" with
"ancestor" in my message.
Alban Gruin March 3, 2019, 8:27 p.m. UTC | #4
Hi Christian,

Le 03/03/2019 à 20:33, Christian Couder a écrit :
> On Fri, Mar 1, 2019 at 7:11 PM Alban Gruin <alban.gruin@gmail.com> wrote:
>>
>> A ref may not be the descendant of all of the commits mentionned in
>> stdin.  In this case, we want to avoid naming its parents.
> 
> Properly speaking a ref usually just points to a commit. It is not a
> parent or a descendant of any commit.
> 
> Also if the commit, let's call it C, pointed to by a ref isn't a
> descendant of a commit, let's call it C', mentioned on stdin, then C'
> isn't a parent of C, though I think we want to avoid naming C' from
> the ref.
> 
> I think it might be clearer with something like:
> 
> "The commit, let's call it C, pointed to by a ref may not be a
> descendant of all the commits mentioned on stdin. In this case we want
> to avoid naming the commits mentioned on stdin that are not parents of
> C using the ref."
> 
> Or is there something I misunderstood?
> 

It’s almost this.  We do want to name *all* of the commits mentionned in
stdin, as far as possible.  We also want to lower the amount of memory
allocation, and one of the possibilities is to avoid traversing the tree
from a commit pointed to by a ref if none of its parents are mentionned
in stdin.

Cheers,
Alban

Patch
diff mbox series

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 2f89ed50a1..f5860f5625 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -8,6 +8,7 @@ 
 #include "parse-options.h"
 #include "sha1-lookup.h"
 #include "commit-slab.h"
+#include "commit-reach.h"
 
 #define CUTOFF_DATE_SLOP 86400 /* one day */
 
@@ -183,6 +184,8 @@  struct name_ref_data {
 	int name_only;
 	struct string_list ref_filters;
 	struct string_list exclude_filters;
+	struct commit **commits;
+	int commits_nr;
 };
 
 static struct tip_table {
@@ -279,13 +282,21 @@  static int name_ref(const char *path, const struct object_id *oid, int flags, vo
 	}
 	if (o && o->type == OBJ_COMMIT) {
 		struct commit *commit = (struct commit *)o;
+		struct commit_list *reachable_commits = NULL;
 		int from_tag = starts_with(path, "refs/tags/");
 
-		if (taggerdate == TIME_MAX)
-			taggerdate = ((struct commit *)o)->date;
-		path = name_ref_abbrev(path, can_abbreviate_output);
-		name_rev(commit, xstrdup(path), taggerdate, 0, 0,
-			 from_tag, deref, NULL);
+		reachable_commits = get_reachable_subset(&commit, 1,
+							 data->commits, data->commits_nr, 0);
+
+		if (commit_list_count(reachable_commits) > 0 || data->commits_nr == 0) {
+			if (taggerdate == TIME_MAX)
+				taggerdate = ((struct commit *)o)->date;
+			path = name_ref_abbrev(path, can_abbreviate_output);
+			name_rev(commit, xstrdup(path), taggerdate, 0, 0,
+				 from_tag, deref, reachable_commits);
+		}
+
+		free_commit_list(reachable_commits);
 	}
 	return 0;
 }
@@ -369,13 +380,53 @@  static char const * const name_rev_usage[] = {
 	NULL
 };
 
-static void name_rev_line(char *p, struct name_ref_data *data)
+static inline int ishex(char p)
+{
+	return isdigit(p) || (p >= 'a' && p <= 'f');
+}
+
+static struct commit **find_commits_in_strbuf(struct strbuf *buf, int *count)
+{
+	int forty = 0;
+	char *p;
+	struct commit **commits = NULL;
+
+	*count = 0;
+
+	for (p = buf->buf; *p; p++) {
+		if (!ishex(*p))
+			forty = 0;
+		else if (++forty == GIT_SHA1_HEXSZ &&
+			 !ishex(*(p+1))) {
+			struct object_id oid;
+			char c = *(p+1);
+
+			*(p+1) = 0;
+			if (!get_oid(p - (GIT_SHA1_HEXSZ -1), &oid)) {
+				struct object *o =
+					parse_object(the_repository, &oid);
+
+				if (o && o->type == OBJ_COMMIT) {
+					struct commit *c = (struct commit *) o;
+
+					REALLOC_ARRAY(commits, (*count) + 1);
+					commits[(*count)++] = c;
+					set_commit_rev_name(c, NULL);
+				}
+			}
+			*(p+1) = c;
+		}
+	}
+
+	return commits;
+}
+
+static void name_rev_buf(char *p, struct name_ref_data *data)
 {
 	struct strbuf buf = STRBUF_INIT;
 	int forty = 0;
 	char *p_start;
 	for (p_start = p; *p; p++) {
-#define ishex(x) (isdigit((x)) || ((x) >= 'a' && (x) <= 'f'))
 		if (!ishex(*p))
 			forty = 0;
 		else if (++forty == GIT_SHA1_HEXSZ &&
@@ -419,7 +470,7 @@  int cmd_name_rev(int argc, const char **argv, const char *prefix)
 {
 	struct object_array revs = OBJECT_ARRAY_INIT;
 	int all = 0, transform_stdin = 0, allow_undefined = 1, always = 0, peel_tag = 0;
-	struct name_ref_data data = { 0, 0, STRING_LIST_INIT_NODUP, STRING_LIST_INIT_NODUP };
+	struct name_ref_data data = { 0, 0, STRING_LIST_INIT_NODUP, STRING_LIST_INIT_NODUP, NULL, 0 };
 	struct option opts[] = {
 		OPT_BOOL(0, "name-only", &data.name_only, N_("print only names (no SHA-1)")),
 		OPT_BOOL(0, "tags", &data.tags_only, N_("only use tags to name the commits")),
@@ -496,20 +547,27 @@  int cmd_name_rev(int argc, const char **argv, const char *prefix)
 
 	if (cutoff)
 		cutoff = cutoff - CUTOFF_DATE_SLOP;
-	for_each_ref(name_ref, &data);
 
 	if (transform_stdin) {
-		char buffer[2048];
+		struct strbuf buf = STRBUF_INIT;
 
-		while (!feof(stdin)) {
-			char *p = fgets(buffer, sizeof(buffer), stdin);
-			if (!p)
-				break;
-			name_rev_line(p, &data);
+		strbuf_read(&buf, STDIN_FILENO, 0);
+		data.commits = find_commits_in_strbuf(&buf, &data.commits_nr);
+
+		if (data.commits_nr > 0) {
+			for_each_ref(name_ref, &data);
+			name_rev_buf(buf.buf, &data);
+		} else {
+			fwrite(buf.buf, buf.len, 1, stdout);
 		}
+
+		free(data.commits);
+		strbuf_release(&buf);
 	} else if (all) {
 		int i, max;
 
+		for_each_ref(name_ref, &data);
+
 		max = get_max_object_index();
 		for (i = 0; i < max; i++) {
 			struct object *obj = get_indexed_object(i);
@@ -520,6 +578,9 @@  int cmd_name_rev(int argc, const char **argv, const char *prefix)
 		}
 	} else {
 		int i;
+
+		for_each_ref(name_ref, &data);
+
 		for (i = 0; i < revs.nr; i++)
 			show_name(revs.objects[i].item, revs.objects[i].name,
 				  always, allow_undefined, data.name_only);