revision: remove stray whitespace when name empty
diff mbox series

Message ID 20190607225900.89299-1-emilyshaffer@google.com
State New
Headers show
Series
  • revision: remove stray whitespace when name empty
Related show

Commit Message

Emily Shaffer June 7, 2019, 10:59 p.m. UTC
Teach show_object_with_name() to avoid writing a space before a name
which is empty. Also teach tests for rev-list --objects --filter to not
require a space between the object ID and name.

show_object_with_name() inserts a space between an object's OID and name
regardless of whether the name is empty or not. This causes 'git
cat-file (--batch | --batch-check)' to fail to discover the type of root
directories:

  git rev-list --objects --filter=tree:1 --max-count=1 HEAD \
    | git cat-file --batch-check
  git rev-parse HEAD: | xargs -I% git cat-file -t %
  git rev-list --objects --filter=tree:1 --max-count=1 HEAD \
    | xargs -I% echo "AA%AA"
  git rev-list --objects --filter=tree:1 --max-count=1 HEAD \
    | cut -f 1 -d ' ' | git cat-file --batch-check

The extra space provided by 'show_object_with_name()' sticks to the
output of 'git rev-list' even if it's piped into a file.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
I don't see any reason _not_ to remove this stray whitespace at the end,
since it seems like it just gets in the way of easy scripting. I also
think this case will only present itself for root trees.

Not sure if making the whitespace optional is the right solution for the
test, although I couldn't come up with a cleaner approach. Maybe
something like this would be better, even though handling it in the
regex is shorter?

  if [[ -z "$name" ]] then
    grep "^$hash" actual
  else
    grep "^$hash $name" actual
  fi

 - Emily

 revision.c                          | 3 ++-
 t/t6112-rev-list-filters-objects.sh | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

Comments

Eric Sunshine June 8, 2019, 12:42 a.m. UTC | #1
On Fri, Jun 7, 2019 at 6:59 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> Teach show_object_with_name() to avoid writing a space before a name
> which is empty. Also teach tests for rev-list --objects --filter to not
> require a space between the object ID and name.
> [...]
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
> Not sure if making the whitespace optional is the right solution for the
> test, although I couldn't come up with a cleaner approach. Maybe
> something like this would be better, even though handling it in the
> regex is shorter?
>
>   if [[ -z "$name" ]] then

In Git, we avoid Bash-isms. Just use 'test'. And, style is to place
'then' on its own line.

    if test -z "$name"
    then

> diff --git a/revision.c b/revision.c
> @@ -40,7 +40,8 @@ void show_object_with_name(FILE *out, struct object *obj, const char *name)
>  {
> -       fprintf(out, "%s ", oid_to_hex(&obj->oid));
> +       fprintf(out, "%s%s", oid_to_hex(&obj->oid),
> +               strcmp(name, "") == 0 ? "" : " ");
>         for (p = name; *p && *p != '\n'; p++)
>                 fputc(*p, out);
>         fputc('\n', out);

It's subjective, but this starts to be a bit too noisy and unreadable.
An alternative:

    fputs(oid_to_hex(...), out);
    if (*name)
        putc(' ', out);

> diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
> @@ -288,7 +288,7 @@ expect_has () {
>         hash=$(git -C r3 rev-parse $commit:$name) &&
> -       grep "^$hash $name$" actual
> +       grep "^$hash \?$name$" actual

This would be the first use of \? with 'grep' in the test suite. I
would worry about portability. Your suggestion earlier to tailor
'grep' invocation based upon whether $name is empty would be safer.
Jeff King June 9, 2019, 1 p.m. UTC | #2
On Fri, Jun 07, 2019 at 03:59:00PM -0700, Emily Shaffer wrote:

> Teach show_object_with_name() to avoid writing a space before a name
> which is empty. Also teach tests for rev-list --objects --filter to not
> require a space between the object ID and name.
> [...]
> ---
> I don't see any reason _not_ to remove this stray whitespace at the end,
> since it seems like it just gets in the way of easy scripting. I also
> think this case will only present itself for root trees.

I'm a bit worried that this might break existing scripts. As ugly as
trailing whitespace is, it does tell you something here: that the object
is a root tree and not a commit.

So in the past I have done things like:

  git rev-list --objects --all | grep ' '

to get only the non-commits. I'm undecided on whether we're straying
into https://xkcd.com/1172/ territory here. I'd be more in favor if this
were making things significantly easier, but...

> show_object_with_name() inserts a space between an object's OID and name
> regardless of whether the name is empty or not. This causes 'git
> cat-file (--batch | --batch-check)' to fail to discover the type of root
> directories:
> 
>   git rev-list --objects --filter=tree:1 --max-count=1 HEAD \
>     | git cat-file --batch-check
>   git rev-parse HEAD: | xargs -I% git cat-file -t %
>   git rev-list --objects --filter=tree:1 --max-count=1 HEAD \
>     | xargs -I% echo "AA%AA"
>   git rev-list --objects --filter=tree:1 --max-count=1 HEAD \
>     | cut -f 1 -d ' ' | git cat-file --batch-check

Your patch only helps with this at all because you're using the "tree:1"
filter. It would not help:

  git rev-list --objects HEAD | git cat-file --batch-check

because there you'll have actual names which cat-file will choke on. So
it seems like this is helping only a very limited use case.

cat-file actually does know how to split on whitespace. Unfortunately it
does not do so by default, because that breaks some cases. See
97be04077f (cat-file: only split on whitespace when %(rest) is used,
2013-08-02).

So you _can_ do:

  git rev-list --objects HEAD |
  git cat-file --batch-check='%(objectname) %(objecttype) %(rest)'

But:

  1. That puts the %(rest) bits in your output, which you may not want.

  2. You have to actually specify the full format, so you might have to
     repeat batch-check's default format items.

I think it would be reasonable for cat-file to have an option to split
on whitespace (and if not given explicitly by the user, default to the
presence of %(rest) as we do now).

Alternatively, it would be reasonable to me to have an option for
"rev-list --objects" to have an option to suppress the filename (and
space) entirely.

I think in the longer run along those lines that "--objects" should
allow cat-file style pretty-print formats, which would eliminate the
need to pipe to cat-file in the first place. That makes this parsing
problem go away entirely, and it's way more efficient to boot (rev-list
already knows the types!).

-Peff
Junio C Hamano June 10, 2019, 4:29 p.m. UTC | #3
Jeff King <peff@peff.net> writes:

> Your patch only helps with this at all because you're using the "tree:1"
> ...
> because there you'll have actual names which cat-file will choke on. So
> it seems like this is helping only a very limited use case.
> ...
> Alternatively, it would be reasonable to me to have an option for
> "rev-list --objects" to have an option to suppress the filename (and
> space) entirely.

Yup, I think that is a more reasonable short-term change compared to
the patch being discussed, and ...

> I think in the longer run along those lines that "--objects" should
> allow cat-file style pretty-print formats, which would eliminate the
> need to pipe to cat-file in the first place. That makes this parsing
> problem go away entirely, and it's way more efficient to boot (rev-list
> already knows the types!).

... of course this makes tons of sense.

Thanks.
Emily Shaffer June 12, 2019, 7:23 p.m. UTC | #4
On Fri, Jun 07, 2019 at 08:42:45PM -0400, Eric Sunshine wrote:
> On Fri, Jun 7, 2019 at 6:59 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> > Teach show_object_with_name() to avoid writing a space before a name
> > which is empty. Also teach tests for rev-list --objects --filter to not
> > require a space between the object ID and name.
> > [...]
> > Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> > ---
> > Not sure if making the whitespace optional is the right solution for the
> > test, although I couldn't come up with a cleaner approach. Maybe
> > something like this would be better, even though handling it in the
> > regex is shorter?
> >
> >   if [[ -z "$name" ]] then
> 
> In Git, we avoid Bash-isms. Just use 'test'. And, style is to place
> 'then' on its own line.
> 
>     if test -z "$name"
>     then
> 
> > diff --git a/revision.c b/revision.c
> > @@ -40,7 +40,8 @@ void show_object_with_name(FILE *out, struct object *obj, const char *name)
> >  {
> > -       fprintf(out, "%s ", oid_to_hex(&obj->oid));
> > +       fprintf(out, "%s%s", oid_to_hex(&obj->oid),
> > +               strcmp(name, "") == 0 ? "" : " ");
> >         for (p = name; *p && *p != '\n'; p++)
> >                 fputc(*p, out);
> >         fputc('\n', out);
> 
> It's subjective, but this starts to be a bit too noisy and unreadable.
> An alternative:
> 
>     fputs(oid_to_hex(...), out);
>     if (*name)
>         putc(' ', out);
> 
> > diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
> > @@ -288,7 +288,7 @@ expect_has () {
> >         hash=$(git -C r3 rev-parse $commit:$name) &&
> > -       grep "^$hash $name$" actual
> > +       grep "^$hash \?$name$" actual
> 
> This would be the first use of \? with 'grep' in the test suite. I
> would worry about portability. Your suggestion earlier to tailor
> 'grep' invocation based upon whether $name is empty would be safer.


Thanks for the review, Eric. I'm going to try again from scratch with
Peff and Junio's suggestion about adding an option to remove object
names, and I'll try to keep your general style concerns in mind when I
do so.

 - Emily
Emily Shaffer June 12, 2019, 7:37 p.m. UTC | #5
On Mon, Jun 10, 2019 at 09:29:14AM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
> 
> > Your patch only helps with this at all because you're using the "tree:1"
> > ...
> > because there you'll have actual names which cat-file will choke on. So
> > it seems like this is helping only a very limited use case.
> > ...
> > Alternatively, it would be reasonable to me to have an option for
> > "rev-list --objects" to have an option to suppress the filename (and
> > space) entirely.
> 
> Yup, I think that is a more reasonable short-term change compared to
> the patch being discussed, and ...

I like this too. Starting work on it today.

> 
> > I think in the longer run along those lines that "--objects" should
> > allow cat-file style pretty-print formats, which would eliminate the
> > need to pipe to cat-file in the first place. That makes this parsing
> > problem go away entirely, and it's way more efficient to boot (rev-list
> > already knows the types!).
> 
> ... of course this makes tons of sense.

Probably not going to start work on this now, but I will make a note to
myself to have a look if I have some spare time soon, and keep an eye
out for reviews in that area.

> 
> Thanks.

Thanks all for the feedback, I'll have a reroll soon.

 - Emily
Jeff King June 13, 2019, 3:20 p.m. UTC | #6
On Wed, Jun 12, 2019 at 12:37:45PM -0700, Emily Shaffer wrote:

> > > Alternatively, it would be reasonable to me to have an option for
> > > "rev-list --objects" to have an option to suppress the filename (and
> > > space) entirely.
> > 
> > Yup, I think that is a more reasonable short-term change compared to
> > the patch being discussed, and ...
> 
> I like this too. Starting work on it today.

Great!

> > > I think in the longer run along those lines that "--objects" should
> > > allow cat-file style pretty-print formats, which would eliminate the
> > > need to pipe to cat-file in the first place. That makes this parsing
> > > problem go away entirely, and it's way more efficient to boot (rev-list
> > > already knows the types!).
> > 
> > ... of course this makes tons of sense.
> 
> Probably not going to start work on this now, but I will make a note to
> myself to have a look if I have some spare time soon, and keep an eye
> out for reviews in that area.

That's fine. I suspect it will be quite involved, because the format
placeholders you'd want for "objects" are going to be more like
"cat-file" ones than like the existing pretty.c ones. So I think this
really implies unifying those format systems, which is a long-term
project that Olga has been working on.

-Peff

Patch
diff mbox series

diff --git a/revision.c b/revision.c
index d4aaf0ef25..260258b371 100644
--- a/revision.c
+++ b/revision.c
@@ -40,7 +40,8 @@  void show_object_with_name(FILE *out, struct object *obj, const char *name)
 {
 	const char *p;
 
-	fprintf(out, "%s ", oid_to_hex(&obj->oid));
+	fprintf(out, "%s%s", oid_to_hex(&obj->oid),
+		strcmp(name, "") == 0 ? "" : " ");
 	for (p = name; *p && *p != '\n'; p++)
 		fputc(*p, out);
 	fputc('\n', out);
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index acd7f5ab80..e05faa7a67 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -288,7 +288,7 @@  expect_has () {
 	name=$2 &&
 
 	hash=$(git -C r3 rev-parse $commit:$name) &&
-	grep "^$hash $name$" actual
+	grep "^$hash \?$name$" actual
 }
 
 test_expect_success 'verify tree:1 includes root trees' '