[RFC,v2,10/13] walken: add unfiltered object walk from HEAD
diff mbox series

Message ID 20190626235032.177551-11-emilyshaffer@google.com
State New
Headers show
Series
  • example implementation of revwalk tutorial
Related show

Commit Message

Emily Shaffer June 26, 2019, 11:50 p.m. UTC
Provide a demonstration of a revision walk which traverses all types of
object, not just commits. This type of revision walk is used for
operations such as creating packfiles and performing fetches or clones,
so it's useful to teach new developers how it works. For starters, only
demonstrate the unfiltered version, as this will make the tutorial
easier to follow.

This commit is part of a tutorial on revision walking.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Change-Id: If3b11652ba011b28d29b1c3984dac4a3f80a5f53
---
 builtin/walken.c | 88 +++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 79 insertions(+), 9 deletions(-)

Comments

Eric Sunshine June 27, 2019, 5:37 a.m. UTC | #1
On Wed, Jun 26, 2019 at 7:51 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> Provide a demonstration of a revision walk which traverses all types of
> object, not just commits. This type of revision walk is used for
> operations such as creating packfiles and performing fetches or clones,
> so it's useful to teach new developers how it works. For starters, only
> demonstrate the unfiltered version, as this will make the tutorial
> easier to follow.
> [...]
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
> diff --git a/builtin/walken.c b/builtin/walken.c
> @@ -103,6 +109,65 @@ static int git_walken_config(const char *var, const char *value, void *cb)
> +static void walken_show_commit(struct commit *cmt, void *buf)
> +{
> +       commit_count++;
> +}
> +
> +static void walken_show_object(struct object *obj, const char *str, void *buf)
> +{
> +       switch (obj->type) {
> +       [...]
> +       case OBJ_TAG:
> +               tag_count++;
> +               break;
> +       case OBJ_COMMIT:
> +               die(_("unexpectedly encountered a commit in "
> +                     "walken_show_object\n"));
> +               commit_count++;

The "commit_count++" line is not only dead code, but it also confuses
the reader (or makes the reader think that the author of this code
doesn't understand C programming). You should drop this line.

> +               break;
> +       default:
> +               die(_("unexpected object type %s\n"), type_name(obj->type));
> +               break;

Likewise, this "break" (and the one in the OBJ_COMMIT case) are dead
code which should be dropped to avoid confusing the reader.

Don't localize the die() message via _() here or in the preceding
OBJ_COMMIT case.

The two die() messages are unnecessarily dissimilar. Try to unify them
so that they read in the same way.

> +       }
> +}> @@ -113,15 +178,15 @@ static void walken_commit_walk(struct rev_info *rev)
>         /*
> -         * prepare_revision_walk() gets the final steps ready for a revision
> +        * prepare_revision_walk() gets the final steps ready for a revision
>          * walk. We check the return value for errors.
> -         */
> +        */
>         /*
> -         * Now we can start the real commit walk. get_revision grabs the next
> +        * Now we can start the real commit walk. get_revision grabs the next
>          * revision based on the contents of rev.
>          */

I think these are just correcting whitespace/indentation errors I
pointed out in an earlier patch (so they are unnecessary noise in this
patch).
Emily Shaffer June 27, 2019, 10:31 p.m. UTC | #2
On Thu, Jun 27, 2019 at 01:37:58AM -0400, Eric Sunshine wrote:
> On Wed, Jun 26, 2019 at 7:51 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> > Provide a demonstration of a revision walk which traverses all types of
> > object, not just commits. This type of revision walk is used for
> > operations such as creating packfiles and performing fetches or clones,
> > so it's useful to teach new developers how it works. For starters, only
> > demonstrate the unfiltered version, as this will make the tutorial
> > easier to follow.
> > [...]
> > Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> > ---
> > diff --git a/builtin/walken.c b/builtin/walken.c
> > @@ -103,6 +109,65 @@ static int git_walken_config(const char *var, const char *value, void *cb)
> > +static void walken_show_commit(struct commit *cmt, void *buf)
> > +{
> > +       commit_count++;
> > +}
> > +
> > +static void walken_show_object(struct object *obj, const char *str, void *buf)
> > +{
> > +       switch (obj->type) {
> > +       [...]
> > +       case OBJ_TAG:
> > +               tag_count++;
> > +               break;
> > +       case OBJ_COMMIT:
> > +               die(_("unexpectedly encountered a commit in "
> > +                     "walken_show_object\n"));
> > +               commit_count++;
> 
> The "commit_count++" line is not only dead code, but it also confuses
> the reader (or makes the reader think that the author of this code
> doesn't understand C programming). You should drop this line.

Ow, yes. Removed. This is stale (pre-die()).

> 
> > +               break;
> > +       default:
> > +               die(_("unexpected object type %s\n"), type_name(obj->type));
> > +               break;
> 
> Likewise, this "break" (and the one in the OBJ_COMMIT case) are dead
> code which should be dropped to avoid confusing the reader.

Done.

> 
> Don't localize the die() message via _() here or in the preceding
> OBJ_COMMIT case.

I'm a little surprised by that. Is it because die() is expected to only
be seen by the developer? It seems like a poor user experience if
someone in non-English locale encounters a bug that Git team didn't
find, and needed to try to translate the English die() string and figure
out if a workaround is possible.

> 
> The two die() messages are unnecessarily dissimilar. Try to unify them
> so that they read in the same way.

I'm a little surprised by this too; it seems to me the root cause of
each would be different. In the former case, I'd guess that
traverse_commit_list()'s behavior changed, and in the latter case I'd
guess that a new object type was recently added to the model. Can you
help me understand the motivation for making the messages similar?

(I don't think, though, that I did a good job of indicating either root
cause in the die() messages as they are now.)

> 
> > +       }
> > +}> @@ -113,15 +178,15 @@ static void walken_commit_walk(struct rev_info *rev)
> >         /*
> > -         * prepare_revision_walk() gets the final steps ready for a revision
> > +        * prepare_revision_walk() gets the final steps ready for a revision
> >          * walk. We check the return value for errors.
> > -         */
> > +        */
> >         /*
> > -         * Now we can start the real commit walk. get_revision grabs the next
> > +        * Now we can start the real commit walk. get_revision grabs the next
> >          * revision based on the contents of rev.
> >          */
> 
> I think these are just correcting whitespace/indentation errors I
> pointed out in an earlier patch (so they are unnecessary noise in this
> patch).

ACK
Eric Sunshine June 28, 2019, 12:48 a.m. UTC | #3
On Thu, Jun 27, 2019 at 6:31 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> On Thu, Jun 27, 2019 at 01:37:58AM -0400, Eric Sunshine wrote:
> > Don't localize the die() message via _() here or in the preceding
> > OBJ_COMMIT case.
>
> I'm a little surprised by that. Is it because die() is expected to only
> be seen by the developer?

Sorry, I was reading those as BUG(), not die(), and we don't localize
BUG() text. But, why aren't those BUG()? Can those cases arise in
practice? (Genuine question; I haven't familiarized myself with that
code yet.)

If they legitimately should be die(), then ignore my comment about not
localizing them.

> > The two die() messages are unnecessarily dissimilar. Try to unify them
> > so that they read in the same way.
>
> I'm a little surprised by this too; it seems to me the root cause of
> each would be different. In the former case, I'd guess that
> traverse_commit_list()'s behavior changed, and in the latter case I'd
> guess that a new object type was recently added to the model. Can you
> help me understand the motivation for making the messages similar?

Both causes you describe here sound like BUG() cases, not die(). If
I'm understanding correctly, they could only trigger if someone made
some breaking or behavior changing modifications within Git and failed
to update all the code in the project impacted by the change. In other
words, these can't be triggered by user input, hence they would be
BUG()s indicating that a Git developer needs to fix the code.

As for the messages themselves, I was referring to the grammatical
dissimilarity of "unexpectedly" and "unexpected", and I also don't
understand why one messages mentions walken_show_object() explicitly,
whereas the other does not.
Emily Shaffer July 1, 2019, 7:19 p.m. UTC | #4
On Thu, Jun 27, 2019 at 08:48:31PM -0400, Eric Sunshine wrote:
> On Thu, Jun 27, 2019 at 6:31 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> > On Thu, Jun 27, 2019 at 01:37:58AM -0400, Eric Sunshine wrote:
> > > Don't localize the die() message via _() here or in the preceding
> > > OBJ_COMMIT case.
> >
> > I'm a little surprised by that. Is it because die() is expected to only
> > be seen by the developer?
> 
> Sorry, I was reading those as BUG(), not die(), and we don't localize
> BUG() text. But, why aren't those BUG()? Can those cases arise in
> practice? (Genuine question; I haven't familiarized myself with that
> code yet.)
> 
> If they legitimately should be die(), then ignore my comment about not
> localizing them.

Hmmm. Yeah, I'll switch them to BUG() - I think there are other
instances of die() in the example and it'd be good to describe yet
another way of reporting behavior.

> 
> > > The two die() messages are unnecessarily dissimilar. Try to unify them
> > > so that they read in the same way.
> >
> > I'm a little surprised by this too; it seems to me the root cause of
> > each would be different. In the former case, I'd guess that
> > traverse_commit_list()'s behavior changed, and in the latter case I'd
> > guess that a new object type was recently added to the model. Can you
> > help me understand the motivation for making the messages similar?
> 
> Both causes you describe here sound like BUG() cases, not die(). If
> I'm understanding correctly, they could only trigger if someone made
> some breaking or behavior changing modifications within Git and failed
> to update all the code in the project impacted by the change. In other
> words, these can't be triggered by user input, hence they would be
> BUG()s indicating that a Git developer needs to fix the code.
> 
> As for the messages themselves, I was referring to the grammatical
> dissimilarity of "unexpectedly" and "unexpected", and I also don't
> understand why one messages mentions walken_show_object() explicitly,
> whereas the other does not.

I see - ok, I have reworded.


Thanks!
 - Emily

Patch
diff mbox series

diff --git a/builtin/walken.c b/builtin/walken.c
index 958923c172..42b23ba1ec 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -11,6 +11,7 @@ 
 #include "parse-options.h"
 #include "pretty.h"
 #include "line-log.h"
+#include "list-objects.h"
 #include "grep.h"
 
 /*
@@ -22,6 +23,11 @@  const char * const walken_usage[] = {
 	NULL,
 };
 
+static int commit_count;
+static int tag_count;
+static int blob_count;
+static int tree_count;
+
 /*
  * Within init_walken_defaults() we can call into other useful defaults to set
  * in the global scope or on the_repository. It's okay to borrow from other
@@ -103,6 +109,65 @@  static int git_walken_config(const char *var, const char *value, void *cb)
 	return git_default_config(var, value, cb);
 }
 
+static void walken_show_commit(struct commit *cmt, void *buf)
+{
+	commit_count++;
+}
+
+static void walken_show_object(struct object *obj, const char *str, void *buf)
+{
+	switch (obj->type) {
+	case OBJ_TREE:
+		tree_count++;
+		break;
+	case OBJ_BLOB:
+		blob_count++;
+		break;
+	case OBJ_TAG:
+		tag_count++;
+		break;
+	case OBJ_COMMIT:
+		die(_("unexpectedly encountered a commit in "
+		      "walken_show_object\n"));
+		commit_count++;
+		break;
+	default:
+		die(_("unexpected object type %s\n"), type_name(obj->type));
+		break;
+	}
+}
+
+/*
+ * walken_object_walk() is invoked by cmd_walken() after initialization. It does
+ * a walk of all object types.
+ */
+static void walken_object_walk(struct rev_info *rev)
+{
+	rev->tree_objects = 1;
+	rev->blob_objects = 1;
+	rev->tag_objects = 1;
+	rev->tree_blobs_in_commit_order = 1;
+	rev->exclude_promisor_objects = 1;
+
+	if (prepare_revision_walk(rev))
+		die(_("revision walk setup failed"));
+
+	commit_count = 0;
+	tag_count = 0;
+	blob_count = 0;
+	tree_count = 0;
+
+	traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
+
+	/*
+	 * This print statement is designed to be script-parseable. Script
+	 * authors will rely on the output not to change, so we will not
+	 * localize this string. It will go to stdout directly.
+	 */
+	printf("commits %d\n blobs %d\n tags %d\n trees %d\n", commit_count,
+	       blob_count, tag_count, tree_count);
+}
+
 /*
  * walken_commit_walk() is invoked by cmd_walken() after initialization. It
  * does the commit walk only.
@@ -113,15 +178,15 @@  static void walken_commit_walk(struct rev_info *rev)
 	struct strbuf prettybuf = STRBUF_INIT;
 
 	/*
-         * prepare_revision_walk() gets the final steps ready for a revision
+	 * prepare_revision_walk() gets the final steps ready for a revision
 	 * walk. We check the return value for errors.
-         */
+	 */
 	if (prepare_revision_walk(rev)) {
 		die(_("revision walk setup failed"));
 	}
 
 	/*
-         * Now we can start the real commit walk. get_revision grabs the next
+	 * Now we can start the real commit walk. get_revision grabs the next
 	 * revision based on the contents of rev.
 	 */
 	rev->diffopt.close_file = 0;
@@ -166,13 +231,18 @@  int cmd_walken(int argc, const char **argv, const char *prefix)
 	/* We can set our traversal flags here. */
 	rev.always_show_header = 1;
 
-	/*
-	 * Before we do the walk, we need to set a starting point by giving it
-	 * something to go in `pending` - that happens in here
-	 */
-	final_rev_info_setup(argc, argv, prefix, &rev);
 
-	walken_commit_walk(&rev);
+	if (1) {
+		add_head_to_pending(&rev);
+		walken_object_walk(&rev);
+	} else {
+		/*
+		 * Before we do the walk, we need to set a starting point by giving it
+		 * something to go in `pending` - that happens in here
+		 */
+		final_rev_info_setup(argc, argv, prefix, &rev);
+		walken_commit_walk(&rev);
+	}
 
 	/*
 	 * This line is "human-readable" and we are writing a plumbing command,