[0/5] cleaning up read_object() family of functions

Message ID	Y7l4LsEQcDT9HZ21@coredump.intra.peff.net (mailing list archive)
Headers	show Return-Path: <git-owner@vger.kernel.org> Date: Sat, 7 Jan 2023 08:48:30 -0500 From: Jeff King <peff@peff.net> To: git@vger.kernel.org Cc: Jonathan Tan <jonathantanmy@google.com> Subject: [PATCH 0/5] cleaning up read_object() family of functions Message-ID: <Y7l4LsEQcDT9HZ21@coredump.intra.peff.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Precedence: bulk
Series	cleaning up read_object() family of functions \| expand [0/5] cleaning up read_object() family of functions [1/5] object-file: inline calls to read_object() [2/5] streaming: inline call to read_object_file_extended() [3/5] read_object_file_extended(): drop lookup_replace option [4/5] repo_read_object_file(): stop wrapping read_object_file_extended() [5/5] packfile: inline custom read_object()

Jeff King Jan. 7, 2023, 1:48 p.m. UTC

I often get confused about the difference between:

  - read_object()
  - read_object_file();
  - read_object_file_extended();
  - repo_read_object_file();

Since Jonathan's recent cleanups from 9e59b38c88 (object-file: emit
corruption errors when detected, 2022-12-14), these are mostly thin
wrappers around each other and around oid_object_info_extended().

This series shuffles things around a little more so that we are down to
just read_object_file() and repo_read_object_file(). And the
relationship there is pretty easy (and long-term we'd eventually merge
them once everyone has a repository object).

It is a net reduction in lines, even though some of the callers end up a
little longer (because they have to stuff pointers into an object_info
struct). If that's too distasteful, the middle ground is to have a
helper like:

  void *foo(struct repository *r, const struct object_id *oid,
            enum object_type *type, unsigned long *size,
	    unsigned flags)
  {
	struct object_info oi = OBJECT_INFO_INIT;
	void *content;

	oi.typep = type;
	oi.sizep = size;
	oi.contentp = ret;

	if (oid_object_info_extended(r, oid, &oi, flags) < 0)
		return NULL;
	return content;
  }

which is basically the same as read_object(), but makes it clear that
you can pass OBJECT_INFO flags. The trouble is that I could not come up
with a name for it that was not confusing. ;) So just having most places
call oid_object_info_extended() directly seemed better. It would be nice
if that function had a shorter name, too, but I left that for another
day.

  [1/5]: object-file: inline calls to read_object()
  [2/5]: streaming: inline call to read_object_file_extended()
  [3/5]: read_object_file_extended(): drop lookup_replace option
  [4/5]: repo_read_object_file(): stop wrapping read_object_file_extended()
  [5/5]: packfile: inline custom read_object()

 object-file.c  | 52 ++++++++++++++++++--------------------------------
 object-store.h | 18 +++++------------
 packfile.c     | 26 +++++++++----------------
 streaming.c    | 11 ++++++++---
 4 files changed, 41 insertions(+), 66 deletions(-)

-Peff

Derrick Stolee Jan. 9, 2023, 3:09 p.m. UTC | #1

On 1/7/2023 8:48 AM, Jeff King wrote:
> I often get confused about the difference between:
> 
>   - read_object()
>   - read_object_file();
>   - read_object_file_extended();
>   - repo_read_object_file();
> 
> Since Jonathan's recent cleanups from 9e59b38c88 (object-file: emit
> corruption errors when detected, 2022-12-14), these are mostly thin
> wrappers around each other and around oid_object_info_extended().
> 
> This series shuffles things around a little more so that we are down to
> just read_object_file() and repo_read_object_file(). And the
> relationship there is pretty easy (and long-term we'd eventually merge
> them once everyone has a repository object).

I read the patches carefully and the translations look correct and
definitely help with this confusing mess of method names.

> It is a net reduction in lines, even though some of the callers end up a
> little longer (because they have to stuff pointers into an object_info
> struct). If that's too distasteful, the middle ground is to have a
> helper like:
> 
>   void *foo(struct repository *r, const struct object_id *oid,
>             enum object_type *type, unsigned long *size,
> 	    unsigned flags)
>   {
> 	struct object_info oi = OBJECT_INFO_INIT;
> 	void *content;
> 
> 	oi.typep = type;
> 	oi.sizep = size;
> 	oi.contentp = ret;
> 
> 	if (oid_object_info_extended(r, oid, &oi, flags) < 0)
> 		return NULL;
> 	return content;
>   }
> 
> which is basically the same as read_object(), but makes it clear that
> you can pass OBJECT_INFO flags. The trouble is that I could not come up
> with a name for it that was not confusing. ;) So just having most places
> call oid_object_info_extended() directly seemed better. It would be nice
> if that function had a shorter name, too, but I left that for another
> day.

I did think that requiring callers to create their own object_info
structs (which takes at least four lines) would be too much, but
the number of new callers is so low that I think this is a fine place
to stop.

Thanks,
-Stolee

Jeff King Jan. 11, 2023, 6:26 p.m. UTC | #2

On Mon, Jan 09, 2023 at 10:09:32AM -0500, Derrick Stolee wrote:

> I did think that requiring callers to create their own object_info
> structs (which takes at least four lines) would be too much, but
> the number of new callers is so low that I think this is a fine place
> to stop.

Yeah, that was my feeling. I do wonder if there's a way to make it
easier for callers of oid_object_info_extended(), but I couldn't come up
with anything that's nice enough to merit the complexity.

For example, here's an attempt to let the caller use designated
initializers to set up the query struct:

diff --git a/object-file.c b/object-file.c
index 80b08fc389..60ca75d755 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1700,13 +1700,12 @@ void *repo_read_object_file(struct repository *r,
 			    enum object_type *type,
 			    unsigned long *size)
 {
-	struct object_info oi = OBJECT_INFO_INIT;
 	unsigned flags = OBJECT_INFO_DIE_IF_CORRUPT | OBJECT_INFO_LOOKUP_REPLACE;
 	void *data;
+	struct object_info oi = OBJECT_INFO(.typep = type,
+					    .sizep = size,
+					    .contentp = &data);
 
-	oi.typep = type;
-	oi.sizep = size;
-	oi.contentp = &data;
 	if (oid_object_info_extended(r, oid, &oi, flags))
 	    return NULL;
 
diff --git a/object-store.h b/object-store.h
index 1a713d89d7..e894cee61b 100644
--- a/object-store.h
+++ b/object-store.h
@@ -418,7 +418,8 @@ struct object_info {
  * Initializer for a "struct object_info" that wants no items. You may
  * also memset() the memory to all-zeroes.
  */
-#define OBJECT_INFO_INIT { 0 }
+#define OBJECT_INFO(...) { 0, __VA_ARGS__ }
+#define OBJECT_INFO_INIT OBJECT_INFO()
 
 /* Invoke lookup_replace_object() on the given hash */
 #define OBJECT_INFO_LOOKUP_REPLACE 1

But:

  - it actually triggers a gcc warning, since OBJECT_INFO(.typep = foo)
    sets typep twice (once for the default "0", and once by name). In
    this case the "0" is superfluous, since that's the default, and we
    could just do:

      #define OBJECT_INFO(...) { __VA_ARGS__ }
      #define OBJECT_INFO_INIT OBJECT_INFO(0)

    but I was hoping to find a general technique for object
    initializers.

  - it's not really that much shorter than the existing code. The real
    benefit of "data = read_object(oid, type, size)" is the implicit
    number and names of the parameters. And the way to get that is to
    provide an extra function.

So I think we are better off with the code that is longer but totally
obvious, unless we really want to add a function wrapper for common
queries as syntactic sugar.

-Peff

Derrick Stolee Jan. 11, 2023, 8:17 p.m. UTC | #3

On 1/11/2023 1:26 PM, Jeff King wrote:
> On Mon, Jan 09, 2023 at 10:09:32AM -0500, Derrick Stolee wrote:
> 
>> I did think that requiring callers to create their own object_info
>> structs (which takes at least four lines) would be too much, but
>> the number of new callers is so low that I think this is a fine place
>> to stop.
> 
> Yeah, that was my feeling. I do wonder if there's a way to make it
> easier for callers of oid_object_info_extended(), but I couldn't come up
> with anything that's nice enough to merit the complexity.
> 
> For example, here's an attempt to let the caller use designated
> initializers to set up the query struct:

> +	struct object_info oi = OBJECT_INFO(.typep = type,
> +					    .sizep = size,
> +					    .contentp = &data);

Your macro expansion creates this format:

	struct object_info oi = {
		.type = type,
		.sizep = size,
		.contentp = &data,
	};

And even this expansion looks a bit better than the inline
updates:

> -	oi.typep = type;
> -	oi.sizep = size;
> -	oi.contentp = &data;

So maybe that's a preferred pattern that we could establish
by replacing the existing callers. It's also such a minor
point that I wouldn't say it's a high priority to do.

Thanks,
-Stolee

Jeff King Jan. 11, 2023, 8:30 p.m. UTC | #4

On Wed, Jan 11, 2023 at 03:17:58PM -0500, Derrick Stolee wrote:

> > For example, here's an attempt to let the caller use designated
> > initializers to set up the query struct:
> 
> > +	struct object_info oi = OBJECT_INFO(.typep = type,
> > +					    .sizep = size,
> > +					    .contentp = &data);
> 
> Your macro expansion creates this format:
> 
> 	struct object_info oi = {
> 		.type = type,
> 		.sizep = size,
> 		.contentp = &data,
> 	};
> 
> And even this expansion looks a bit better than the inline
> updates:

There's a subtle assumption in the expanded initializer, though, which
is that everything not specified is OK to be zero-initialized. That
works for object_info, but not for arbitrary structs (which is why we
have these INIT macros in the first place).

-Peff

Ævar Arnfjörð Bjarmason Jan. 12, 2023, 9:21 a.m. UTC | #5

On Wed, Jan 11 2023, Jeff King wrote:

> On Mon, Jan 09, 2023 at 10:09:32AM -0500, Derrick Stolee wrote:
>
>> I did think that requiring callers to create their own object_info
>> structs (which takes at least four lines) would be too much, but
>> the number of new callers is so low that I think this is a fine place
>> to stop.
>
> Yeah, that was my feeling. I do wonder if there's a way to make it
> easier for callers of oid_object_info_extended(), but I couldn't come up
> with anything that's nice enough to merit the complexity.
>
> For example, here's an attempt to let the caller use designated
> initializers to set up the query struct:
>
> diff --git a/object-file.c b/object-file.c
> index 80b08fc389..60ca75d755 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1700,13 +1700,12 @@ void *repo_read_object_file(struct repository *r,
>  			    enum object_type *type,
>  			    unsigned long *size)
>  {
> -	struct object_info oi = OBJECT_INFO_INIT;
>  	unsigned flags = OBJECT_INFO_DIE_IF_CORRUPT | OBJECT_INFO_LOOKUP_REPLACE;
>  	void *data;
> +	struct object_info oi = OBJECT_INFO(.typep = type,
> +					    .sizep = size,
> +					    .contentp = &data);
>  
> -	oi.typep = type;
> -	oi.sizep = size;
> -	oi.contentp = &data;
>  	if (oid_object_info_extended(r, oid, &oi, flags))
>  	    return NULL;
>  
> diff --git a/object-store.h b/object-store.h
> index 1a713d89d7..e894cee61b 100644
> --- a/object-store.h
> +++ b/object-store.h
> @@ -418,7 +418,8 @@ struct object_info {
>   * Initializer for a "struct object_info" that wants no items. You may
>   * also memset() the memory to all-zeroes.
>   */
> -#define OBJECT_INFO_INIT { 0 }
> +#define OBJECT_INFO(...) { 0, __VA_ARGS__ }
> +#define OBJECT_INFO_INIT OBJECT_INFO()
>  
>  /* Invoke lookup_replace_object() on the given hash */
>  #define OBJECT_INFO_LOOKUP_REPLACE 1
>
> But:
>
>   - it actually triggers a gcc warning, since OBJECT_INFO(.typep = foo)
>     sets typep twice (once for the default "0", and once by name). In
>     this case the "0" is superfluous, since that's the default, and we
>     could just do:
>
>       #define OBJECT_INFO(...) { __VA_ARGS__ }
>       #define OBJECT_INFO_INIT OBJECT_INFO(0)
>
>     but I was hoping to find a general technique for object
>     initializers.
>
>   - it's not really that much shorter than the existing code. The real
>     benefit of "data = read_object(oid, type, size)" is the implicit
>     number and names of the parameters. And the way to get that is to
>     provide an extra function.
>
> So I think we are better off with the code that is longer but totally
> obvious, unless we really want to add a function wrapper for common
> queries as syntactic sugar.
>
> -Peff

I agree that it's probably not worth it here, but I think you're just
tying yourself in knots in trying to define these macros in terms of
each other. This sort of thing will work if you just do:
	
	diff --git a/object-store.h b/object-store.h
	index e894cee61ba..bfcd2482dc5 100644
	--- a/object-store.h
	+++ b/object-store.h
	@@ -418,8 +418,8 @@ struct object_info {
	  * Initializer for a "struct object_info" that wants no items. You may
	  * also memset() the memory to all-zeroes.
	  */
	-#define OBJECT_INFO(...) { 0, __VA_ARGS__ }
	-#define OBJECT_INFO_INIT OBJECT_INFO()
	+#define OBJECT_INFO_INIT { 0 }
	+#define OBJECT_INFO(...) { __VA_ARGS__ }
	 
	 /* Invoke lookup_replace_object() on the given hash */
	 #define OBJECT_INFO_LOOKUP_REPLACE 1

Which is just a twist on René's suggestion from [1], i.e.:

	#define CHILD_PROCESS_INIT_EX(...) { .args = STRVEC_INIT, __VA_ARGS__ }

In that case we always need to rely on the "args" being init'd, and the
GCC warning you note is a feature, its initialization is "private", and
you should never override it.

But likewise you don't need the "0" there, if the user provides an empty
list that's their own fault, they should use OBJECT_INFO_INIT
instead.

If they do provide arguments it's an implementation detail how any
"default" arguments get init'd, if they're not clobbering any "private"
arguments we're OK.

So using an explicit "0" is the same as providing nothing in the
"*_ARGS()" case, in both cases we're just offloading that zero-init to
the language.

The only way I think you can dig yourself into a proper hole here is if
you're trying to support 0 or N args, as P99 shows that's possible, but
quite complex (and not worth it, IMO).

1. https://lore.kernel.org/git/749f6adc-928a-0978-e3a1-2ede9f07def0@web.de/

Jeff King Jan. 12, 2023, 4:16 p.m. UTC | #6

On Thu, Jan 12, 2023 at 10:21:46AM +0100, Ævar Arnfjörð Bjarmason wrote:

> I agree that it's probably not worth it here, but I think you're just
> tying yourself in knots in trying to define these macros in terms of
> each other. This sort of thing will work if you just do:
> 	
> 	diff --git a/object-store.h b/object-store.h
> 	index e894cee61ba..bfcd2482dc5 100644
> 	--- a/object-store.h
> 	+++ b/object-store.h
> 	@@ -418,8 +418,8 @@ struct object_info {
> 	  * Initializer for a "struct object_info" that wants no items. You may
> 	  * also memset() the memory to all-zeroes.
> 	  */
> 	-#define OBJECT_INFO(...) { 0, __VA_ARGS__ }
> 	-#define OBJECT_INFO_INIT OBJECT_INFO()
> 	+#define OBJECT_INFO_INIT { 0 }
> 	+#define OBJECT_INFO(...) { __VA_ARGS__ }

Right, that works because the initializer is just "0", which the
compiler can do for us implicitly. I agree it works here to omit, but as
a general solution, it doesn't.

> Which is just a twist on René's suggestion from [1], i.e.:
> 
> 	#define CHILD_PROCESS_INIT_EX(...) { .args = STRVEC_INIT, __VA_ARGS__ }
>
> In that case we always need to rely on the "args" being init'd, and the
> GCC warning you note is a feature, its initialization is "private", and
> you should never override it.

Right, and it works here because you'd never want to init .args to
anything else (which I think is what you mean by "private"). But in the
general case the defaults can't set something that the caller might want
to override, because the compiler's warning doesn't know the difference
between "override" and "oops, you specified this twice".

It's mostly a non-issue because we tend to prefer 0-initialization when
possible, but I think as a general technique this is probably opening a
can of worms for little benefit.

-Peff

Ævar Arnfjörð Bjarmason Jan. 12, 2023, 4:22 p.m. UTC | #7

On Thu, Jan 12 2023, Jeff King wrote:

> On Thu, Jan 12, 2023 at 10:21:46AM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> I agree that it's probably not worth it here, but I think you're just
>> tying yourself in knots in trying to define these macros in terms of
>> each other. This sort of thing will work if you just do:
>> 	
>> 	diff --git a/object-store.h b/object-store.h
>> 	index e894cee61ba..bfcd2482dc5 100644
>> 	--- a/object-store.h
>> 	+++ b/object-store.h
>> 	@@ -418,8 +418,8 @@ struct object_info {
>> 	  * Initializer for a "struct object_info" that wants no items. You may
>> 	  * also memset() the memory to all-zeroes.
>> 	  */
>> 	-#define OBJECT_INFO(...) { 0, __VA_ARGS__ }
>> 	-#define OBJECT_INFO_INIT OBJECT_INFO()
>> 	+#define OBJECT_INFO_INIT { 0 }
>> 	+#define OBJECT_INFO(...) { __VA_ARGS__ }
>
> Right, that works because the initializer is just "0", which the
> compiler can do for us implicitly. I agree it works here to omit, but as
> a general solution, it doesn't.
>
>> Which is just a twist on René's suggestion from [1], i.e.:
>> 
>> 	#define CHILD_PROCESS_INIT_EX(...) { .args = STRVEC_INIT, __VA_ARGS__ }
>>
>> In that case we always need to rely on the "args" being init'd, and the
>> GCC warning you note is a feature, its initialization is "private", and
>> you should never override it.
>
> Right, and it works here because you'd never want to init .args to
> anything else (which I think is what you mean by "private"). But in the
> general case the defaults can't set something that the caller might want
> to override, because the compiler's warning doesn't know the difference
> between "override" and "oops, you specified this twice".
>
> It's mostly a non-issue because we tend to prefer 0-initialization when
> possible, but I think as a general technique this is probably opening a
> can of worms for little benefit.

You're right in the general case, although I think that if we did
encounter such a use-case a perfectly good solution would be to just
suppress the GCC-specific warning with the relevant GCC-specific macro
magic, this being perfectly valid C, just something it (rightly, as it's
almost always a mistake) complains about.

But I can't think of a case where this would matter for us in practice.

We have members like "struct strbuf"'s "buf", which always needs to be
init'd, but never "maybe by the user", so the pattern above would work
there.

Then we have things like "strdup_strings" which we might imagine that
the user would override (with a hypothetical "struct string_list" that
took more arguments, but in those cases we could just add another init
macro, as "STRING_LIST_INIT_{DUP,NODUP}" does.

For any such member we could always just invert its boolean state, if it
came to that, couldn't we?

Anyway, I agree that it's not worth pursuing this in this case.

But I think it's a neat pattern that we might find use for sooner than
later for something else.

I don't think it's worth the churn to change it at this point (except
maybe with a sufficiently clever coccinelle rule), but I think it's
already "worth it" in the case of the run-command API, if we were adding
that code today under current constraints (i.e. being able to use C99
macro features).

Jeff King Jan. 12, 2023, 4:53 p.m. UTC | #8

On Thu, Jan 12, 2023 at 05:22:04PM +0100, Ævar Arnfjörð Bjarmason wrote:

> We have members like "struct strbuf"'s "buf", which always needs to be
> init'd, but never "maybe by the user", so the pattern above would work
> there.

We've discussed in the past having a strbuf that points to an existing
buffer, over which it takes ownership. Or a const string that we'd leave
behind (but not free) if we needed to grow.

In those cases you'd want to pass in a buffer to the allocator. Of
course in the case of a strbuf those initializers would probably just be
totally separate from the regular slopbuf one, just because there's not
much else in a strbuf to initialize. You don't gain much from trying to
avoid repetition.

> Anyway, I agree that it's not worth pursuing this in this case.
> 
> But I think it's a neat pattern that we might find use for sooner than
> later for something else.

I remain unconvinced. ;) Mostly just that the lines saved versus the
amount of magic and thought doesn't seem reasonable. But it's something
we can keep in mind as new opportunities show up.

-Peff

[0/5] cleaning up read_object() family of functions

Message

Comments