diff mbox series

[5/5] packfile: inline custom read_object()

Message ID Y7l4vQwRZzGtxlBB@coredump.intra.peff.net (mailing list archive)
State New, archived
Headers show
Series cleaning up read_object() family of functions | expand

Commit Message

Jeff King Jan. 7, 2023, 1:50 p.m. UTC
When the pack code was split into its own file[1], it got a copy of the
static read_object() function. But there's only one caller here, so we
could just inline it. And it's worth doing so, as the name read_object()
invites comparisons to the public read_object_file(), but the two don't
behave quite the same.

[1] The move happened over several commits, but the relevant one here is
    f1d8130be0 (pack: move clear_delta_base_cache(), packed_object_info(),
    unpack_entry(), 2017-08-18).

Signed-off-by: Jeff King <peff@peff.net>
---
 packfile.c | 26 +++++++++-----------------
 1 file changed, 9 insertions(+), 17 deletions(-)

Comments

Ævar Arnfjörð Bjarmason Jan. 12, 2023, 9:01 a.m. UTC | #1
On Sat, Jan 07 2023, Jeff King wrote:

> When the pack code was split into its own file[1], it got a copy of the
> static read_object() function. But there's only one caller here, so we
> could just inline it. And it's worth doing so, as the name read_object()
> invites comparisons to the public read_object_file(), but the two don't
> behave quite the same.
>
> [1] The move happened over several commits, but the relevant one here is
>     f1d8130be0 (pack: move clear_delta_base_cache(), packed_object_info(),
>     unpack_entry(), 2017-08-18).
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  packfile.c | 26 +++++++++-----------------
>  1 file changed, 9 insertions(+), 17 deletions(-)
>
> diff --git a/packfile.c b/packfile.c
> index c0d7dd93f4..79e21ab18e 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -1650,22 +1650,6 @@ struct unpack_entry_stack_ent {
>  	unsigned long size;
>  };
>  
> -static void *read_object(struct repository *r,
> -			 const struct object_id *oid,
> -			 enum object_type *type,
> -			 unsigned long *size)
> -{
> -	struct object_info oi = OBJECT_INFO_INIT;
> -	void *content;
> -	oi.typep = type;
> -	oi.sizep = size;
> -	oi.contentp = &content;
> -
> -	if (oid_object_info_extended(r, oid, &oi, 0) < 0)
> -		return NULL;
> -	return content;
> -}
> -
>  void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
>  		   enum object_type *final_type, unsigned long *final_size)
>  {
> @@ -1798,14 +1782,22 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
>  			uint32_t pos;
>  			struct object_id base_oid;
>  			if (!(offset_to_pack_pos(p, obj_offset, &pos))) {
> +				struct object_info oi = OBJECT_INFO_INIT;
> +
>  				nth_packed_object_id(&base_oid, p,
>  						     pack_pos_to_index(p, pos));
>  				error("failed to read delta base object %s"
>  				      " at offset %"PRIuMAX" from %s",
>  				      oid_to_hex(&base_oid), (uintmax_t)obj_offset,
>  				      p->pack_name);
>  				mark_bad_packed_object(p, &base_oid);
> -				base = read_object(r, &base_oid, &type, &base_size);
> +
> +				oi.typep = &type;
> +				oi.sizep = &base_size;
> +				oi.contentp = &base;
> +				if (oid_object_info_extended(r, &base_oid, &oi, 0) < 0)
> +					base = NULL;
> +
>  				external_base = base;
>  			}
>  		}

This isn't introducing a behavior difference, in fact it's diligently
bending over backwards to preserve existing behavior, but I don't think
we need to do so, and shouldn't have this "base = NULL" line.

Here we're within an "if" block where we tested that "base == NULL"
(which is why we're trying to populate it)

Before when we had read_object() re-assigning to "base" here was the
obvious thing to do, but now this seems like undue an incomplete
paranoia.

If oid_object_info_extended() why can't we trust that it didn't touch
our "base"? And if we can't trust that, why are we trusting that it left
"type" and "base_size" untouched?

I think squashing this in would be much better:
	
	diff --git a/packfile.c b/packfile.c
	index 79e21ab18e7..f45017422a1 100644
	--- a/packfile.c
	+++ b/packfile.c
	@@ -1795,10 +1795,8 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
	 				oi.typep = &type;
	 				oi.sizep = &base_size;
	 				oi.contentp = &base;
	-				if (oid_object_info_extended(r, &base_oid, &oi, 0) < 0)
	-					base = NULL;
	-
	-				external_base = base;
	+				if (!oid_object_info_extended(r, &base_oid, &oi, 0))
	+					external_base = base;
	 			}
	 		}

Not only aren't we second-guessing that our "base" was left alone, we're
using the return value of oid_object_info_extended() to guard that
assignment to "external_base" instead (it's NULL at this point too).
Jeff King Jan. 12, 2023, 4:29 p.m. UTC | #2
On Thu, Jan 12, 2023 at 10:01:28AM +0100, Ævar Arnfjörð Bjarmason wrote:

> > -				base = read_object(r, &base_oid, &type, &base_size);
> > +
> > +				oi.typep = &type;
> > +				oi.sizep = &base_size;
> > +				oi.contentp = &base;
> > +				if (oid_object_info_extended(r, &base_oid, &oi, 0) < 0)
> > +					base = NULL;
> > +
> >  				external_base = base;
> >  			}
> >  		}
> 
> This isn't introducing a behavior difference, in fact it's diligently
> bending over backwards to preserve existing behavior, but I don't think
> we need to do so, and shouldn't have this "base = NULL" line.
> 
> Here we're within an "if" block where we tested that "base == NULL"
> (which is why we're trying to populate it)
> 
> Before when we had read_object() re-assigning to "base" here was the
> obvious thing to do, but now this seems like undue an incomplete
> paranoia.

I think it's the same paranoia that was in read_object(). There it
catches the error and returns NULL, rather than the probably-NULL
"content" (though to be fair, it simply did not initialize the pointer,
so it would have had to do that to depend on it).

I agree it's probably being overly defensive. But I don't think
oid_object_info_extended() makes any promises, and it's not completely
clear to me if packed_object_info() could return a non-NULL entry here
on an error (e.g., if packed_to_object_type() fails even after we pulled
out the content).

So probably yes, we could depend on that (and if not, arguably we should
be fixing oid_object_info_extended(), because we are probably leaking a
buffer in that case). But we definitely shouldn't be doing it in the
middle of another patch.

> If oid_object_info_extended() why can't we trust that it didn't touch
> our "base"? And if we can't trust that, why are we trusting that it left
> "type" and "base_size" untouched?

My assumption is that "base" gated access to "type" and "base_size". So
as long as "!base", we do not look at the other two.

> I think squashing this in would be much better:
> 	
> 	diff --git a/packfile.c b/packfile.c
> 	index 79e21ab18e7..f45017422a1 100644
> 	--- a/packfile.c
> 	+++ b/packfile.c
> 	@@ -1795,10 +1795,8 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
> 	 				oi.typep = &type;
> 	 				oi.sizep = &base_size;
> 	 				oi.contentp = &base;
> 	-				if (oid_object_info_extended(r, &base_oid, &oi, 0) < 0)
> 	-					base = NULL;
> 	-
> 	-				external_base = base;
> 	+				if (!oid_object_info_extended(r, &base_oid, &oi, 0))
> 	+					external_base = base;
> 	 			}
> 	 		}
> 
> Not only aren't we second-guessing that our "base" was left alone, we're
> using the return value of oid_object_info_extended() to guard that
> assignment to "external_base" instead (it's NULL at this point too).

I don't think we need to guard the assignment (we know it will be NULL
if we saw an error). But sure, I don't mind if you want to do that
simplification, but it should be on top if at all.

-Peff
diff mbox series

Patch

diff --git a/packfile.c b/packfile.c
index c0d7dd93f4..79e21ab18e 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1650,22 +1650,6 @@  struct unpack_entry_stack_ent {
 	unsigned long size;
 };
 
-static void *read_object(struct repository *r,
-			 const struct object_id *oid,
-			 enum object_type *type,
-			 unsigned long *size)
-{
-	struct object_info oi = OBJECT_INFO_INIT;
-	void *content;
-	oi.typep = type;
-	oi.sizep = size;
-	oi.contentp = &content;
-
-	if (oid_object_info_extended(r, oid, &oi, 0) < 0)
-		return NULL;
-	return content;
-}
-
 void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
 		   enum object_type *final_type, unsigned long *final_size)
 {
@@ -1798,14 +1782,22 @@  void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
 			uint32_t pos;
 			struct object_id base_oid;
 			if (!(offset_to_pack_pos(p, obj_offset, &pos))) {
+				struct object_info oi = OBJECT_INFO_INIT;
+
 				nth_packed_object_id(&base_oid, p,
 						     pack_pos_to_index(p, pos));
 				error("failed to read delta base object %s"
 				      " at offset %"PRIuMAX" from %s",
 				      oid_to_hex(&base_oid), (uintmax_t)obj_offset,
 				      p->pack_name);
 				mark_bad_packed_object(p, &base_oid);
-				base = read_object(r, &base_oid, &type, &base_size);
+
+				oi.typep = &type;
+				oi.sizep = &base_size;
+				oi.contentp = &base;
+				if (oid_object_info_extended(r, &base_oid, &oi, 0) < 0)
+					base = NULL;
+
 				external_base = base;
 			}
 		}