[v2] rev-list: teach --oid-only to enable piping
diff mbox series

Message ID 20190613215102.44627-1-emilyshaffer@google.com
State New
Headers show
Series
  • [v2] rev-list: teach --oid-only to enable piping
Related show

Commit Message

Emily Shaffer June 13, 2019, 9:51 p.m. UTC
Allow easier parsing by cat-file by giving rev-list an option to print
only the OID of an object without any additional information. This is a
short-term shim; later on, rev-list should be taught how to print the
types of objects it finds in a format similar to cat-file's.

Before this commit, the output from rev-list needed to be massaged
before being piped to cat-file, like so:

  git rev-list --objects HEAD | cut -f 1 -d ' ' \
    | git cat-file --batch-check

This was especially unexpected when dealing with root trees, as an
invisible whitespace exists at the end of the OID:

  git rev-list --objects --filter=tree:1 --max-count=1 HEAD \
    | xargs -I% echo "AA%AA"

Now, it can be piped directly, as in the added test case:

  git rev-list --objects --oid-only HEAD | git cat-file --batch-check

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Change-Id: I489bdf0a8215532e540175188883ff7541d70e1b
---
It didn't appear that using an existing --pretty string would do the
trick, as the formatting options for --pretty apply specifically to
commit objects (you can specify the commit hash and the tree hash, but
I didn't see a way to more generally pretty-print all types of objects).
rev-list doesn't appear to use parse-options-h, so I added a new option
as simply as I could see without breaking existing style; it seemed
wrong to add a flag to the `struct rev_list_info` as the definition of
struct and flag values alike are contained in bisect.h. There are a
couple other options to rev-list which are stored as globals.

At the moment, this doesn't work with --abbrev, and although I think
it'd be a good fit, right now --abbrev doesn't seem to work without
--abbrev-commit, and both those options end up being eaten by
setup_revisions() rather than by rev-list itself.

 - Emily

 builtin/rev-list.c       | 21 ++++++++++++++++++++-
 t/t6000-rev-list-misc.sh | 18 ++++++++++++++++++
 2 files changed, 38 insertions(+), 1 deletion(-)

Comments

Jeff King June 14, 2019, 4:07 p.m. UTC | #1
On Thu, Jun 13, 2019 at 02:51:03PM -0700, Emily Shaffer wrote:

> It didn't appear that using an existing --pretty string would do the
> trick, as the formatting options for --pretty apply specifically to
> commit objects (you can specify the commit hash and the tree hash, but
> I didn't see a way to more generally pretty-print all types of objects).

Right, it would be a big change to start custom-formatting the
non-commits. I think this is a good step in the right direction.

> rev-list doesn't appear to use parse-options-h, so I added a new option
> as simply as I could see without breaking existing style; it seemed
> wrong to add a flag to the `struct rev_list_info` as the definition of
> struct and flag values alike are contained in bisect.h. There are a
> couple other options to rev-list which are stored as globals.

I think that's reasonable. This is definitely a rev-list option (not a
revision.c one). The interaction between bisect.h and rev-list is a bit
funny, but it's probably simpler not to try solving it here.

> +	if (arg_oid_only && revs.commit_format != CMIT_FMT_UNSPECIFIED)
> +		die(_("cannot combine --oid-only and --pretty or --header"));

As you've implemented it here, I definitely agree that this conflicts
with --pretty. But I think it also interacts badly with other options,
like "--graph".

TBH, I am not sure that "rev-list --graph --objects" is all that useful,
because the non-commit objects are not prefixed with the graph lines.
And without "--objects", this new option is not useful because the
default is already to show only the oid.

But I wonder if things would be simpler if we did not touch the commit
code path at all. I.e., if this were simply "--no-object-names", and it
touched only show_object(). And then you do not have to worry about
funny interactions with commit formatting at all. It would be valid to
do:

  git rev-list --pretty=raw --objects --no-object-names HEAD

if you really wanted to (though I admit I do not have an immediate use
for it).

> @@ -255,6 +262,10 @@ static void show_object(struct object *obj, const char *name, void *cb_data)
>  	display_progress(progress, ++progress_counter);
>  	if (info->flags & REV_LIST_QUIET)
>  		return;
> +	if (arg_oid_only) {
> +		printf("%s\n", oid_to_hex(&obj->oid));
> +		return;
> +	}
>  	show_object_with_name(stdout, obj, name);
>  }
>  

A minor style point, but I think this might be easier to follow without
the early return, since we are really choosing to do A or B. Writing:

  if (arg_oid_only)
	printf(...);
  else
	show_object_with_name(...);

shows that more clearly, I think.

> [...]

Otherwise, this looks good, including the tests. One thing I did notice
in the tests:

> +test_expect_success 'rev-list --objects --oid-only is usable by cat-file' '
> +	git rev-list --objects --oid-only --all >list-output &&
> +	git cat-file --batch-check <list-output >cat-output &&
> +	! grep missing cat-output
> +'

Usually we prefer to look for the expected output, rather than making
sure we did not find the unexpected. But I'm not sure if that might be
burdensome in this case (i.e., if there's a bunch of cruft coming out of
"rev-list" that would be annoying to match, and might even change as
people add more tests). So I'm OK with it either way.

-Peff
Junio C Hamano June 14, 2019, 8:25 p.m. UTC | #2
Jeff King <peff@peff.net> writes:

> But I wonder if things would be simpler if we did not touch the commit
> code path at all. I.e., if this were simply "--no-object-names", and it
> touched only show_object().

Yeah, that sounds more tempting.  And the refined code structure you
suggested ...

>> @@ -255,6 +262,10 @@ static void show_object(struct object *obj, const char *name, void *cb_data)
>>  	display_progress(progress, ++progress_counter);
>>  	if (info->flags & REV_LIST_QUIET)
>>  		return;
>> +	if (arg_oid_only) {
>> +		printf("%s\n", oid_to_hex(&obj->oid));
>> +		return;
>> +	}
>>  	show_object_with_name(stdout, obj, name);
>>  }
>>  
>
> A minor style point, but I think this might be easier to follow without
> the early return, since we are really choosing to do A or B. Writing:
>
>   if (arg_oid_only)
> 	printf(...);
>   else
> 	show_object_with_name(...);
>
> shows that more clearly, I think.

... is a good way to clearly show that intention, I would think.
Emily Shaffer June 14, 2019, 11:18 p.m. UTC | #3
On Fri, Jun 14, 2019 at 01:25:59PM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
> 
> > But I wonder if things would be simpler if we did not touch the commit
> > code path at all. I.e., if this were simply "--no-object-names", and it
> > touched only show_object().
> 
> Yeah, that sounds more tempting.  And the refined code structure you
> suggested ...
> 
> >> @@ -255,6 +262,10 @@ static void show_object(struct object *obj, const char *name, void *cb_data)
> >>  	display_progress(progress, ++progress_counter);
> >>  	if (info->flags & REV_LIST_QUIET)
> >>  		return;
> >> +	if (arg_oid_only) {
> >> +		printf("%s\n", oid_to_hex(&obj->oid));
> >> +		return;
> >> +	}
> >>  	show_object_with_name(stdout, obj, name);
> >>  }
> >>  
> >
> > A minor style point, but I think this might be easier to follow without
> > the early return, since we are really choosing to do A or B. Writing:
> >
> >   if (arg_oid_only)
> > 	printf(...);
> >   else
> > 	show_object_with_name(...);
> >
> > shows that more clearly, I think.
> 
> ... is a good way to clearly show that intention, I would think.

Sounds good. Thanks, both; I'll reroll that quickly today.
Emily Shaffer June 14, 2019, 11:29 p.m. UTC | #4
On Fri, Jun 14, 2019 at 12:07:28PM -0400, Jeff King wrote:
> On Thu, Jun 13, 2019 at 02:51:03PM -0700, Emily Shaffer wrote:

> > +test_expect_success 'rev-list --objects --oid-only is usable by cat-file' '
> > +	git rev-list --objects --oid-only --all >list-output &&
> > +	git cat-file --batch-check <list-output >cat-output &&
> > +	! grep missing cat-output
> > +'
> 
> Usually we prefer to look for the expected output, rather than making
> sure we did not find the unexpected. But I'm not sure if that might be
> burdensome in this case (i.e., if there's a bunch of cruft coming out of
> "rev-list" that would be annoying to match, and might even change as
> people add more tests). So I'm OK with it either way.

My (newbie) opinion is that in this case, we specifically want to know
that cat-file didn't choke on objects which we know exist (since they
came from rev-list). I have the feeling that checking for the exact
objects returned instead (or a sample of them) would be more brittle and
would also make the wording of the test less direct.

So if there's no complaint either way, I'd prefer to leave it the way it
is.

By the way, rev-list-misc.sh has a number of other existing "! grep ..."
lines.

 - Emily
Jeff King June 19, 2019, 9:24 p.m. UTC | #5
On Fri, Jun 14, 2019 at 04:29:46PM -0700, Emily Shaffer wrote:

> On Fri, Jun 14, 2019 at 12:07:28PM -0400, Jeff King wrote:
> > On Thu, Jun 13, 2019 at 02:51:03PM -0700, Emily Shaffer wrote:
> 
> > > +test_expect_success 'rev-list --objects --oid-only is usable by cat-file' '
> > > +	git rev-list --objects --oid-only --all >list-output &&
> > > +	git cat-file --batch-check <list-output >cat-output &&
> > > +	! grep missing cat-output
> > > +'
> > 
> > Usually we prefer to look for the expected output, rather than making
> > sure we did not find the unexpected. But I'm not sure if that might be
> > burdensome in this case (i.e., if there's a bunch of cruft coming out of
> > "rev-list" that would be annoying to match, and might even change as
> > people add more tests). So I'm OK with it either way.
> 
> My (newbie) opinion is that in this case, we specifically want to know
> that cat-file didn't choke on objects which we know exist (since they
> came from rev-list). I have the feeling that checking for the exact
> objects returned instead (or a sample of them) would be more brittle and
> would also make the wording of the test less direct.
> 
> So if there's no complaint either way, I'd prefer to leave it the way it
> is.

Yeah, that's fine with me if it seems more clear to use grep here (and I
was on the fence).

> By the way, rev-list-misc.sh has a number of other existing "! grep ..."
> lines.

It never fails that when I complain about a style issue, the surrounding
code is full of the same thing. ;) I'd have to look at each one to
determine if they're sensible or not, and it's probably not worth
anybody's time to do that cleanup at this point in time.

-Peff

Patch
diff mbox series

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 9f31837d30..ff07ea9564 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -48,7 +48,7 @@  static const char rev_list_usage[] =
 "    --children\n"
 "    --objects | --objects-edge\n"
 "    --unpacked\n"
-"    --header | --pretty\n"
+"    --header | --pretty | --oid-only\n"
 "    --abbrev=<n> | --no-abbrev\n"
 "    --abbrev-commit\n"
 "    --left-right\n"
@@ -75,6 +75,8 @@  enum missing_action {
 };
 static enum missing_action arg_missing_action;
 
+static int arg_oid_only; /* display only the oid of each object encountered */
+
 #define DEFAULT_OIDSET_SIZE     (16*1024)
 
 static void finish_commit(struct commit *commit, void *data);
@@ -90,6 +92,11 @@  static void show_commit(struct commit *commit, void *data)
 		return;
 	}
 
+	if (arg_oid_only) {
+		printf("%s\n", oid_to_hex(&commit->object.oid));
+		return;
+	}
+
 	graph_show_commit(revs->graph);
 
 	if (revs->count) {
@@ -255,6 +262,10 @@  static void show_object(struct object *obj, const char *name, void *cb_data)
 	display_progress(progress, ++progress_counter);
 	if (info->flags & REV_LIST_QUIET)
 		return;
+	if (arg_oid_only) {
+		printf("%s\n", oid_to_hex(&obj->oid));
+		return;
+	}
 	show_object_with_name(stdout, obj, name);
 }
 
@@ -484,6 +495,11 @@  int cmd_rev_list(int argc, const char **argv, const char *prefix)
 		if (skip_prefix(arg, "--missing=", &arg))
 			continue; /* already handled above */
 
+		if (!strcmp(arg, ("--oid-only"))) {
+			arg_oid_only = 1;
+			continue;
+		}
+
 		usage(rev_list_usage);
 
 	}
@@ -499,6 +515,9 @@  int cmd_rev_list(int argc, const char **argv, const char *prefix)
 		/* Only --header was specified */
 		revs.commit_format = CMIT_FMT_RAW;
 
+	if (arg_oid_only && revs.commit_format != CMIT_FMT_UNSPECIFIED)
+		die(_("cannot combine --oid-only and --pretty or --header"));
+
 	if ((!revs.commits && reflog_walk_empty(revs.reflog_info) &&
 	     (!(revs.tag_objects || revs.tree_objects || revs.blob_objects) &&
 	      !revs.pending.nr) &&
diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh
index 0507999729..996ba1799b 100755
--- a/t/t6000-rev-list-misc.sh
+++ b/t/t6000-rev-list-misc.sh
@@ -48,6 +48,24 @@  test_expect_success 'rev-list --objects with pathspecs and copied files' '
 	! grep one output
 '
 
+test_expect_success 'rev-list --objects --oid-only has no whitespace/names' '
+	git rev-list --objects --oid-only HEAD >output &&
+	! grep wanted_file output &&
+	! grep unwanted_file output &&
+	! grep " " output
+'
+
+test_expect_success 'rev-list --objects --oid-only is usable by cat-file' '
+	git rev-list --objects --oid-only --all >list-output &&
+	git cat-file --batch-check <list-output >cat-output &&
+	! grep missing cat-output
+'
+
+test_expect_success 'rev-list --oid-only is incompatible with --pretty' '
+	test_must_fail git rev-list --objects --oid-only --pretty HEAD &&
+	test_must_fail git rev-list --objects --oid-only --header HEAD
+'
+
 test_expect_success 'rev-list A..B and rev-list ^A B are the same' '
 	git commit --allow-empty -m another &&
 	git tag -a -m "annotated" v1.0 &&