[5/5] load_ref_decorations(): avoid parsing non-tag objects

Message ID	YNILCDz3LpHX7OX0@coredump.intra.peff.net (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> Date: Tue, 22 Jun 2021 12:08:40 -0400 From: Jeff King <peff@peff.net> To: git@vger.kernel.org Cc: Taylor Blau <me@ttaylorr.com> Subject: [PATCH 5/5] load_ref_decorations(): avoid parsing non-tag objects Message-ID: <YNILCDz3LpHX7OX0@coredump.intra.peff.net> References: <YNIJw/8p0F3cPfzd@coredump.intra.peff.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <YNIJw/8p0F3cPfzd@coredump.intra.peff.net> Precedence: bulk
Series	some "log --decorate" optimizations \| expand [0/5] some "log --decorate" optimizations [1/5] pretty.h: update and expand docstring for userformat_find_requirements() [2/5] log: avoid loading decorations for userformats that don't need it [3/5] object.h: expand docstring for lookup_unknown_object() [4/5] object.h: add lookup_object_by_type() function [5/5] load_ref_decorations(): avoid parsing non-tag objects

Jeff King June 22, 2021, 4:08 p.m. UTC

When we load the ref decorations, we parse the object pointed to by each
ref in order to get a "struct object". This is unnecessarily expensive;
we really only need the object struct, and don't even look at the parsed
contents. The exception is tags, which we do need to peel.

We can improve this by looking up the object type first (which is much
cheaper), and skipping the parse entirely for non-tags. This increases
the work slightly for annotated tags (which now do a type lookup _and_ a
parse), but decreases it a lot for other types. On balance, this seems
to be a good tradeoff.

In my git.git clone, with ~2k refs, most of which are branches, the time
to run "git log -1 --decorate" drops from 34ms to 11ms. Even on my
linux.git clone, which contains mostly tags and only a handful of
branches, the time drops from 30ms to 19ms. And on a more extreme
real-world case with ~220k refs, mostly non-tags, the time drops from
2.6s to 650ms.

That command is a lop-sided example, of course, because it does as
little non-loading work as possible. But it does show the absolute time
improvement. Even in something like a full "git log --decorate" on that
extreme repo, we'd still be saving 2s of CPU time.

Ideally we could push this even further, and avoid parsing even tags, by
relying on the packed-refs "peel" optimization (which we could do by
calling peel_iterated_oid() instead of peeling manually). But we can't
do that here. The packed-refs file only stores the bottom-layer of the
peel (so in a "tag->tag->commit" chain, it stores only the commit as the
peel result).  But the decoration code wants to peel the layers
individually, annotating the middle layers of the chain.

If the packed-refs file ever learns to store all of the peeled layers,
then we could switch to it. Or even if it stored a flag to indicate the
peel was not multi-layer (because most of them aren't), then we could
use it most of the time and fall back to a manual peel for the rare
cases.

Signed-off-by: Jeff King <peff@peff.net>
---
 log-tree.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Derrick Stolee June 22, 2021, 4:35 p.m. UTC | #1

On 6/22/2021 12:08 PM, Jeff King wrote:

> -	obj = parse_object(the_repository, oid);
> -	if (!obj)
> +	objtype = oid_object_info(the_repository, oid, NULL);
> +	if (type < 0)
>  		return 0;

Do you mean "if (objtype < 0)" here? There is a 'type' variable,
but it is an enum decoration_type and I can't find a reason why
it would be negative. oid_object_info() _does_ return -1 if there
is a problem loading the object, so that would make sense.

This is the only question I had about the entire series. Everything
else LGTM.

Thanks,
-Stolee

Ævar Arnfjörð Bjarmason June 22, 2021, 5:06 p.m. UTC | #2

On Tue, Jun 22 2021, Jeff King wrote:

> When we load the ref decorations, we parse the object pointed to by each
> ref in order to get a "struct object". This is unnecessarily expensive;
> we really only need the object struct, and don't even look at the parsed
> contents. The exception is tags, which we do need to peel.
>
> We can improve this by looking up the object type first (which is much
> cheaper), and skipping the parse entirely for non-tags. This increases
> the work slightly for annotated tags (which now do a type lookup _and_ a
> parse), but decreases it a lot for other types. On balance, this seems
> to be a good tradeoff.
>
> In my git.git clone, with ~2k refs, most of which are branches, the time
> to run "git log -1 --decorate" drops from 34ms to 11ms. Even on my
> linux.git clone, which contains mostly tags and only a handful of
> branches, the time drops from 30ms to 19ms. And on a more extreme
> real-world case with ~220k refs, mostly non-tags, the time drops from
> 2.6s to 650ms.
>
> That command is a lop-sided example, of course, because it does as
> little non-loading work as possible. But it does show the absolute time
> improvement. Even in something like a full "git log --decorate" on that
> extreme repo, we'd still be saving 2s of CPU time.
>
> Ideally we could push this even further, and avoid parsing even tags, by
> relying on the packed-refs "peel" optimization (which we could do by
> calling peel_iterated_oid() instead of peeling manually). But we can't
> do that here. The packed-refs file only stores the bottom-layer of the
> peel (so in a "tag->tag->commit" chain, it stores only the commit as the
> peel result).  But the decoration code wants to peel the layers
> individually, annotating the middle layers of the chain.
>
> If the packed-refs file ever learns to store all of the peeled layers,
> then we could switch to it. Or even if it stored a flag to indicate the
> peel was not multi-layer (because most of them aren't), then we could
> use it most of the time and fall back to a manual peel for the rare
> cases.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  log-tree.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/log-tree.c b/log-tree.c
> index 7b823786c2..8b700e9c14 100644
> --- a/log-tree.c
> +++ b/log-tree.c
> @@ -134,6 +134,7 @@ static int add_ref_decoration(const char *refname, const struct object_id *oid,
>  			      int flags, void *cb_data)
>  {
>  	struct object *obj;
> +	enum object_type objtype;
>  	enum decoration_type type = DECORATION_NONE;
>  	struct decoration_filter *filter = (struct decoration_filter *)cb_data;
>  
> @@ -155,9 +156,10 @@ static int add_ref_decoration(const char *refname, const struct object_id *oid,
>  		return 0;
>  	}
>  
> -	obj = parse_object(the_repository, oid);
> -	if (!obj)
> +	objtype = oid_object_info(the_repository, oid, NULL);
> +	if (type < 0)
>  		return 0;
> +	obj = lookup_object_by_type(the_repository, oid, objtype);

This series looks good. I just wonder if between this and my own
lookup_{blob,tree,tag,commit}_type() in [1] whether exposing some
function between what we have now in parse_object() and
parse_object_buffer() wouldn't also do this for you.

I.e. in my patch if you pass a type into parse_object_buffer() I think
you'll get the same behavior.

To be clear I see nothing wrong with this, it's more of a musing about
how some functions in object.c discover the type on their own, others
allow passing it in, sometimes (worse before that series of mine) we
relay the not-real-but-inferred-type etc.

1. https://lore.kernel.org/git/patch-10.11-a84f670ac24-20210328T021238Z-avarab@gmail.com/

Jeff King June 22, 2021, 5:06 p.m. UTC | #3

On Tue, Jun 22, 2021 at 12:35:46PM -0400, Derrick Stolee wrote:

> On 6/22/2021 12:08 PM, Jeff King wrote:
> 
> > -	obj = parse_object(the_repository, oid);
> > -	if (!obj)
> > +	objtype = oid_object_info(the_repository, oid, NULL);
> > +	if (type < 0)
> >  		return 0;
> 
> Do you mean "if (objtype < 0)" here? There is a 'type' variable,
> but it is an enum decoration_type and I can't find a reason why
> it would be negative. oid_object_info() _does_ return -1 if there
> is a problem loading the object, so that would make sense.

Whoops, thanks for catching that. I originally called it "enum
object_type type", but then of course the compiler informed that there
was already a "type" variable in the function. So I renamed it to
"objtype" but missed updating that line. But it still compiled. Yikes. :)

It doesn't trigger in the test suite because it only happens if the
repository is corrupted.

> This is the only question I had about the entire series. Everything
> else LGTM.

Thanks. Here's an amended version of the patch, in case there are no
other fixups necessary (fingers crossed).

-Peff

-- >8 --
Subject: [PATCH] load_ref_decorations(): avoid parsing non-tag objects

When we load the ref decorations, we parse the object pointed to by each
ref in order to get a "struct object". This is unnecessarily expensive;
we really only need the object struct, and don't even look at the parsed
contents. The exception is tags, which we do need to peel.

We can improve this by looking up the object type first (which is much
cheaper), and skipping the parse entirely for non-tags. This increases
the work slightly for annotated tags (which now do a type lookup _and_ a
parse), but decreases it a lot for other types. On balance, this seems
to be a good tradeoff.

In my git.git clone, with ~2k refs, most of which are branches, the time
to run "git log -1 --decorate" drops from 34ms to 11ms. Even on my
linux.git clone, which contains mostly tags and only a handful of
branches, the time drops from 30ms to 19ms. And on a more extreme
real-world case with ~220k refs, mostly non-tags, the time drops from
2.6s to 650ms.

That command is a lop-sided example, of course, because it does as
little non-loading work as possible. But it does show the absolute time
improvement. Even in something like a full "git log --decorate" on that
extreme repo, we'd still be saving 2s of CPU time.

Ideally we could push this even further, and avoid parsing even tags, by
relying on the packed-refs "peel" optimization (which we could do by
calling peel_iterated_oid() instead of peeling manually). But we can't
do that here. The packed-refs file only stores the bottom-layer of the
peel (so in a "tag->tag->commit" chain, it stores only the commit as the
peel result).  But the decoration code wants to peel the layers
individually, annotating the middle layers of the chain.

If the packed-refs file ever learns to store all of the peeled layers,
then we could switch to it. Or even if it stored a flag to indicate the
peel was not multi-layer (because most of them aren't), then we could
use it most of the time and fall back to a manual peel for the rare
cases.

Signed-off-by: Jeff King <peff@peff.net>
---
 log-tree.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/log-tree.c b/log-tree.c
index 7b823786c2..c3c8e7e1df 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -134,6 +134,7 @@ static int add_ref_decoration(const char *refname, const struct object_id *oid,
 			      int flags, void *cb_data)
 {
 	struct object *obj;
+	enum object_type objtype;
 	enum decoration_type type = DECORATION_NONE;
 	struct decoration_filter *filter = (struct decoration_filter *)cb_data;

@@ -155,9 +156,10 @@ static int add_ref_decoration(const char *refname, const struct object_id *oid,
 		return 0;
 	}

-	obj = parse_object(the_repository, oid);
-	if (!obj)
+	objtype = oid_object_info(the_repository, oid, NULL);
+	if (objtype < 0)
 		return 0;
+	obj = lookup_object_by_type(the_repository, oid, objtype);

 	if (starts_with(refname, "refs/heads/"))
 		type = DECORATION_REF_LOCAL;

Jeff King June 22, 2021, 5:09 p.m. UTC | #4

On Tue, Jun 22, 2021 at 01:06:48PM -0400, Jeff King wrote:

> > Do you mean "if (objtype < 0)" here? There is a 'type' variable,
> > but it is an enum decoration_type and I can't find a reason why
> > it would be negative. oid_object_info() _does_ return -1 if there
> > is a problem loading the object, so that would make sense.
> 
> Whoops, thanks for catching that. I originally called it "enum
> object_type type", but then of course the compiler informed that there
> was already a "type" variable in the function. So I renamed it to
> "objtype" but missed updating that line. But it still compiled. Yikes. :)
> 
> It doesn't trigger in the test suite because it only happens if the
> repository is corrupted.

I'm tempted by this as a cleanup on top (I don't want to do it inline,
since the diff is so noisy). But I'm also content to leave it.

-- >8 --
Subject: [PATCH] add_ref_decoration(): rename s/type/deco_type/

Now that we have two types (a decoration type and an object type) in the
function, let's give them both unique names to avoid accidentally using
one instead of the other.

Signed-off-by: Jeff King <peff@peff.net>
---
 log-tree.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/log-tree.c b/log-tree.c
index c3c8e7e1df..4f69ed176d 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -135,7 +135,7 @@ static int add_ref_decoration(const char *refname, const struct object_id *oid,
 {
 	struct object *obj;
 	enum object_type objtype;
-	enum decoration_type type = DECORATION_NONE;
+	enum decoration_type deco_type = DECORATION_NONE;
 	struct decoration_filter *filter = (struct decoration_filter *)cb_data;
 
 	if (filter && !ref_filter_match(refname, filter))
@@ -162,17 +162,17 @@ static int add_ref_decoration(const char *refname, const struct object_id *oid,
 	obj = lookup_object_by_type(the_repository, oid, objtype);
 
 	if (starts_with(refname, "refs/heads/"))
-		type = DECORATION_REF_LOCAL;
+		deco_type = DECORATION_REF_LOCAL;
 	else if (starts_with(refname, "refs/remotes/"))
-		type = DECORATION_REF_REMOTE;
+		deco_type = DECORATION_REF_REMOTE;
 	else if (starts_with(refname, "refs/tags/"))
-		type = DECORATION_REF_TAG;
+		deco_type = DECORATION_REF_TAG;
 	else if (!strcmp(refname, "refs/stash"))
-		type = DECORATION_REF_STASH;
+		deco_type = DECORATION_REF_STASH;
 	else if (!strcmp(refname, "HEAD"))
-		type = DECORATION_REF_HEAD;
+		deco_type = DECORATION_REF_HEAD;
 
-	add_name_decoration(type, refname, obj);
+	add_name_decoration(deco_type, refname, obj);
 	while (obj->type == OBJ_TAG) {
 		obj = ((struct tag *)obj)->tagged;
 		if (!obj)

Derrick Stolee June 22, 2021, 5:25 p.m. UTC | #5

On 6/22/2021 1:09 PM, Jeff King wrote:
> On Tue, Jun 22, 2021 at 01:06:48PM -0400, Jeff King wrote:
> 
>>> Do you mean "if (objtype < 0)" here? There is a 'type' variable,
>>> but it is an enum decoration_type and I can't find a reason why
>>> it would be negative. oid_object_info() _does_ return -1 if there
>>> is a problem loading the object, so that would make sense.
>>
>> Whoops, thanks for catching that. I originally called it "enum
>> object_type type", but then of course the compiler informed that there
>> was already a "type" variable in the function. So I renamed it to
>> "objtype" but missed updating that line. But it still compiled. Yikes. :)
>>
>> It doesn't trigger in the test suite because it only happens if the
>> repository is corrupted.
> 
> I'm tempted by this as a cleanup on top (I don't want to do it inline,
> since the diff is so noisy). But I'm also content to leave it.

Naming is hard. This change seems worth it.

Thanks,
-Stolee

Ævar Arnfjörð Bjarmason June 22, 2021, 6:27 p.m. UTC | #6

On Tue, Jun 22 2021, Jeff King wrote:

> On Tue, Jun 22, 2021 at 12:35:46PM -0400, Derrick Stolee wrote:
>
>> On 6/22/2021 12:08 PM, Jeff King wrote:
>> 
>> > -	obj = parse_object(the_repository, oid);
>> > -	if (!obj)
>> > +	objtype = oid_object_info(the_repository, oid, NULL);
>> > +	if (type < 0)
>> >  		return 0;
>> 
>> Do you mean "if (objtype < 0)" here? There is a 'type' variable,
>> but it is an enum decoration_type and I can't find a reason why
>> it would be negative. oid_object_info() _does_ return -1 if there
>> is a problem loading the object, so that would make sense.
>
> Whoops, thanks for catching that. I originally called it "enum
> object_type type", but then of course the compiler informed that there
> was already a "type" variable in the function. So I renamed it to
> "objtype" but missed updating that line. But it still compiled. Yikes. :)

[Enter Captain Hindsight]

If you use a slightly different coding style and leverage the
information the compiler has to work with you'd get it to error for you,
e.g. this on your original patch would catch it:

	diff --git a/log-tree.c b/log-tree.c
	index 8b700e9c142..7e3a011b533 100644
	--- a/log-tree.c
	+++ b/log-tree.c
	@@ -157,9 +157,12 @@ static int add_ref_decoration(const char *refname, const struct object_id *oid,
	 	}

	 	objtype = oid_object_info(the_repository, oid, NULL);
	-	if (type < 0)
	+	switch (type) {
	+	case OBJ_BAD:
	 		return 0;
	-	obj = lookup_object_by_type(the_repository, oid, objtype);
	+	default:
	+		obj = lookup_object_by_type(the_repository, oid, objtype);
	+	}

	 	if (starts_with(refname, "refs/heads/"))
	 		type = DECORATION_REF_LOCAL;

IMO the real problem is an over-reliance on C being so happy to treat
enums as ints (well, with them being ints). If you consistently use
labels you get the compiler to do the checking. For me with gcc and
clang with that on top:

	log-tree.c:161:2: error: case value ‘4294967295’ not in enumerated type ‘enum decoration_type’ [-Werror=switch]
	  case OBJ_BAD:
	  ^~~~
	log-tree.c:161:7: error: case value not in enumerated type 'enum decoration_type' [-Werror,-Wswitch]
	        case OBJ_BAD:
	             ^

I think we've disagreed on that exact point before recently, i.e. you
think we shouldn't rely on OBJ_BAD in that way, and instead check for
any negative value:
https://lore.kernel.org/git/YHCZh5nLNVEHCWV2@coredump.intra.peff.net/

This sort of thing is a good reason to pick the opposite pattern. You
get the same type checking you'd usually get with anything else in C.

Yes, it is more verbose e.g. in this case, and particularly (as noted
downthread of what I linked to) because "enum object_type" contains so
many uncommon things, and really should be split up.

In practice I don't think it's too verbose, because once you start
consistently using the pattern you'll usually not be doing conversions
all over the place, and would just do this sort of thing via a helper
that does the type checking, e.g. something like this (or anything else
where you don't lose the type & labels):

	diff --git a/log-tree.c b/log-tree.c
	index 8b700e9c142..a61fb01ba3f 100644
	--- a/log-tree.c
	+++ b/log-tree.c
	@@ -130,6 +130,30 @@ static int ref_filter_match(const char *refname,
	 	return 1;
	 }

	+static enum object_type oid_object_info_ok(struct repository *repo,
	+					   struct object_id *oid,
	+					   enum object_type *typep,
	+					   unsigned long *sizep)
	+{
	+	enum object_type type = oid_object_info(repo, oid, sizep);
	+	*typep = type;
	+	switch (type) {
	+	case OBJ_BAD:
	+		return 0;
	+	case OBJ_COMMIT:
	+	case OBJ_TREE:
	+	case OBJ_BLOB:
	+	case OBJ_TAG:
	+		return 1;
	+	case OBJ_NONE:
	+	case OBJ_OFS_DELTA:
	+	case OBJ_REF_DELTA:
	+	case OBJ_ANY:
	+	case OBJ_MAX:
	+		BUG("the enum_object type is too large!");
	+	}
	+}
	+
	 static int add_ref_decoration(const char *refname, const struct object_id *oid,
	 			      int flags, void *cb_data)
	 {
	@@ -156,8 +180,7 @@ static int add_ref_decoration(const char *refname, const struct object_id *oid,
	 		return 0;
	 	}

	-	objtype = oid_object_info(the_repository, oid, NULL);
	-	if (type < 0)
	+	if (!oid_object_info_ok(the_repository, oid, &type, NULL))
	 		return 0;
	 	obj = lookup_object_by_type(the_repository, oid, objtype);

With that pattern GCC narrowlry pulls ahead with showing 4 warnings just
about the loss of the type, with Clang at 3 :)

Jeff King June 22, 2021, 6:57 p.m. UTC | #7

On Tue, Jun 22, 2021 at 07:06:43PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > @@ -155,9 +156,10 @@ static int add_ref_decoration(const char *refname, const struct object_id *oid,
> >  		return 0;
> >  	}
> >  
> > -	obj = parse_object(the_repository, oid);
> > -	if (!obj)
> > +	objtype = oid_object_info(the_repository, oid, NULL);
> > +	if (type < 0)
> >  		return 0;
> > +	obj = lookup_object_by_type(the_repository, oid, objtype);
> 
> This series looks good. I just wonder if between this and my own
> lookup_{blob,tree,tag,commit}_type() in [1] whether exposing some
> function between what we have now in parse_object() and
> parse_object_buffer() wouldn't also do this for you.
> 
> I.e. in my patch if you pass a type into parse_object_buffer() I think
> you'll get the same behavior.

Maybe, but I'm having trouble seeing what would be a helpful
abstraction. I don't think I'd want to use parse_object_buffer() here;
we don't have a buffer at all (and obviously it could learn to handle
NULL, but that's extra code there).

parse_object_buffer() could call this lookup_object_by_type() to get the
struct, which would save it a few lines. But it still has to do the
if-else chain for each type, because it does other type-specific
things.

So I dunno. I would be happy if you came up with something, just because
it's nice to minimize the number of places that do this if-else/switch
on type. But I have a feeling it's diminishing returns in terms of
complexity. If we can at least contain it all to object.c, that's
something.

-Peff

Jeff King June 22, 2021, 7:08 p.m. UTC | #8

On Tue, Jun 22, 2021 at 08:27:53PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > Whoops, thanks for catching that. I originally called it "enum
> > object_type type", but then of course the compiler informed that there
> > was already a "type" variable in the function. So I renamed it to
> > "objtype" but missed updating that line. But it still compiled. Yikes. :)
> 
> [Enter Captain Hindsight]
> 
> If you use a slightly different coding style and leverage the
> information the compiler has to work with you'd get it to error for you,
> e.g. this on your original patch would catch it:
> 
> 	diff --git a/log-tree.c b/log-tree.c
> 	index 8b700e9c142..7e3a011b533 100644
> 	--- a/log-tree.c
> 	+++ b/log-tree.c
> 	@@ -157,9 +157,12 @@ static int add_ref_decoration(const char *refname, const struct object_id *oid,
> 	 	}
> 	 
> 	 	objtype = oid_object_info(the_repository, oid, NULL);
> 	-	if (type < 0)
> 	+	switch (type) {
> 	+	case OBJ_BAD:
> 	 		return 0;
> 	-	obj = lookup_object_by_type(the_repository, oid, objtype);
> 	+	default:
> 	+		obj = lookup_object_by_type(the_repository, oid, objtype);
> 	+	}
> 	 
> 	 	if (starts_with(refname, "refs/heads/"))
> 	 		type = DECORATION_REF_LOCAL;

Yeah, I agree that would find it in this case. I do find that style
slightly harder to read, though. And...

> IMO the real problem is an over-reliance on C being so happy to treat
> enums as ints (well, with them being ints). If you consistently use
> labels you get the compiler to do the checking. For me with gcc and
> clang with that on top:
> 	
> 	log-tree.c:161:2: error: case value ‘4294967295’ not in enumerated type ‘enum decoration_type’ [-Werror=switch]
> 	  case OBJ_BAD:
> 	  ^~~~
> 	log-tree.c:161:7: error: case value not in enumerated type 'enum decoration_type' [-Werror,-Wswitch]
> 	        case OBJ_BAD:
> 	             ^

...it would help in this case because OBJ_BAD happens to have a value
that is not defined for decoration_type. If it did, then the compiler
would be quite happy to consider them equivalent.

So I don't disagree with you exactly, but I'm not sure of the tradeoff
of always using switches instead of conditionals (which IMHO is less
readable) for more compiler safety that only works sometimes is worth
it.

> I think we've disagreed on that exact point before recently, i.e. you
> think we shouldn't rely on OBJ_BAD in that way, and instead check for
> any negative value:
> https://lore.kernel.org/git/YHCZh5nLNVEHCWV2@coredump.intra.peff.net/

To be clear, my complaint about checking for OBJ_BAD exactly is that it
closes the door on other negative return types. And indeed, the switch()
you showed above would become a silent bug if we introduced
OBJ_BAD_FOR_ANOTHER_READ as "-2" (without any compiler support, because
of the "default" case in the switch statement).

Now that's somewhat hypothetical, but in the near-term it also means
confirming that any of the functions which get converted from "int" to
"enum object_type" are not in fact passing back "-2" in any
circumstances.

That said...

> In practice I don't think it's too verbose, because once you start
> consistently using the pattern you'll usually not be doing conversions
> all over the place, and would just do this sort of thing via a helper
> that does the type checking, e.g. something like this (or anything else
> where you don't lose the type & labels):
> [...]
> 	-	objtype = oid_object_info(the_repository, oid, NULL);
> 	-	if (type < 0)
> 	+	if (!oid_object_info_ok(the_repository, oid, &type, NULL))
> 	 		return 0;

Yes, that would deal with that problem. It's definitely a different
style, but one that I could get used to. It's a lot more object oriented
("you are not allowed to do numeric logic on an object type; you can
only use these accessor methods to query it"). If we were going that
route, I would stop having "enum object_type" at all, and instead make
it "struct object_type { enum { ... } value }". That would prevent
anybody from accidentally just looking at it, and instead force people
into that object-oriented style.

I dunno. It is is a big departure from how we do things now. And the bug
here notwithstanding, I don't feel like enum confusion has generally
been a big source of error for us.

-Peff

Taylor Blau June 23, 2021, 2:46 a.m. UTC | #9

On Tue, Jun 22, 2021 at 12:08:40PM -0400, Jeff King wrote:
> If the packed-refs file ever learns to store all of the peeled layers,
> then we could switch to it. Or even if it stored a flag to indicate the
> peel was not multi-layer (because most of them aren't), then we could
> use it most of the time and fall back to a manual peel for the rare
> cases.

Yeah, I would be in favor of either of these. Of the two, the latter
seems like the simplest thing, but the former provides you all of the
information you could hope for.

I suppose that if you are already changing the format of packed-refs,
then we might as well do the thing which provides the most information
and allows us to optimize *all* cases, not just the vast majority of
them.

Of course, that's all way outside of the scope of this patch, which
shouldn't have to deal with either of those.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  log-tree.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/log-tree.c b/log-tree.c
> index 7b823786c2..8b700e9c14 100644
> --- a/log-tree.c
> +++ b/log-tree.c
> @@ -134,6 +134,7 @@ static int add_ref_decoration(const char *refname, const struct object_id *oid,
>  			      int flags, void *cb_data)
>  {
>  	struct object *obj;
> +	enum object_type objtype;
>  	enum decoration_type type = DECORATION_NONE;
>  	struct decoration_filter *filter = (struct decoration_filter *)cb_data;
>
> @@ -155,9 +156,10 @@ static int add_ref_decoration(const char *refname, const struct object_id *oid,
>  		return 0;
>  	}
>
> -	obj = parse_object(the_repository, oid);
> -	if (!obj)
> +	objtype = oid_object_info(the_repository, oid, NULL);
> +	if (type < 0)
>  		return 0;
> +	obj = lookup_object_by_type(the_repository, oid, objtype);

The comments about s/type/obj&/ aside, this looks good to me (and the
amended version below looks ready to be picked up, at least in my eyes).

One thing I did want to note which is elided by the limited context is
that we *do* parse tags like you say, it's just hidden in the "while
(obj->type == OBJ_TAG)" loop below.

So that's doing the right thing, but it wasn't clear from the limited
context here that this patch was immediately correct.

Thanks,
Taylor

Jeff King June 23, 2021, 9:51 p.m. UTC | #10

On Tue, Jun 22, 2021 at 10:46:40PM -0400, Taylor Blau wrote:

> On Tue, Jun 22, 2021 at 12:08:40PM -0400, Jeff King wrote:
> > If the packed-refs file ever learns to store all of the peeled layers,
> > then we could switch to it. Or even if it stored a flag to indicate the
> > peel was not multi-layer (because most of them aren't), then we could
> > use it most of the time and fall back to a manual peel for the rare
> > cases.
> 
> Yeah, I would be in favor of either of these. Of the two, the latter
> seems like the simplest thing, but the former provides you all of the
> information you could hope for.
> 
> I suppose that if you are already changing the format of packed-refs,
> then we might as well do the thing which provides the most information
> and allows us to optimize *all* cases, not just the vast majority of
> them.

One reason not to include all of them is that the list can be
arbitrarily long, and regular readers of packed-refs (who may not even
care about peeling at all) have to skip past it. That matters a little
less these since we binary-search it (but you still might be iterating
over the ref).

So I think either way it is a tradeoff, and you are making assumptions
about which cases are less likely.

If I were to work on this (and I don't have any immediate plans to do
so), I'd probably do whichever is easiest to implement, and to maintain
backwards-compatibility. And I suspect that is the "flag" approach, but
a lot would depend on the details of our parser and what it permits.

-Peff

[5/5] load_ref_decorations(): avoid parsing non-tag objects

Commit Message

Comments

Patch