diff mbox series

[RFC] object-file API: add a format_loose_header() function

Message ID RFC-patch-1.1-bda62567f6b-20211220T120740Z-avarab@gmail.com (mailing list archive)
State New, archived
Headers show
Series [RFC] object-file API: add a format_loose_header() function | expand

Commit Message

Ævar Arnfjörð Bjarmason Dec. 20, 2021, 12:10 p.m. UTC
Add a convenience function to wrap the xsnprintf() command that
generates loose object headers. This code was copy/pasted in various
parts of the codebase, let's define it in one place and re-use it from
there.

All except one caller of it had a valid "enum object_type" for us,
it's only write_object_file_prepare() which might need to deal with
"git hash-object --literally" and a potential garbage type. Let's have
the primary API use an "enum object_type", and define an *_extended()
function that can take an arbitrary "const char *" for the type.

See [1] for the discussion that prompted this patch, i.e. new code in
object-file.c that wanted to copy/paste the xsnprintf() invocation.

1. https://lore.kernel.org/git/211213.86bl1l9bfz.gmgdl@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---

On Fri, Dec 17 2021, Han Xin wrote:

> From: Han Xin <hanxin.hx@alibaba-inc.com>
>
> There are 3 places where "xsnprintf" is used to generate the object
> header, and I originally planned to add a fourth in the latter patch.
>
> According to Ævar Arnfjörð Bjarmason’s suggestion, although it's just
> one line, it's also code that's very central to git, so reafactor them
> into a function which will help later readability.
>
> Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> Signed-off-by: Han Xin <hanxin.hx@alibaba-inc.com>

I came up with this after my comment on the earlier round suggesting
to factor out that header formatting. I don't know if this more
thorough approach is worth it or if you'd like to replace your change
with this one, but just posting it here as an RFC.

 builtin/index-pack.c |  3 +--
 bulk-checkin.c       |  4 ++--
 cache.h              | 21 +++++++++++++++++++++
 http-push.c          |  2 +-
 object-file.c        | 14 +++++++++++---
 5 files changed, 36 insertions(+), 8 deletions(-)

Comments

Philip Oakley Dec. 20, 2021, 12:48 p.m. UTC | #1
Hi Ævar,
(catching up after a week away, and noticed your patch today..)

On 20/12/2021 12:10, Ævar Arnfjörð Bjarmason wrote:
> Add a convenience function to wrap the xsnprintf() command that
> generates loose object headers. This code was copy/pasted in various
> parts of the codebase, let's define it in one place and re-use it from
> there.
>
> All except one caller of it had a valid "enum object_type" for us,
> it's only write_object_file_prepare() which might need to deal with
> "git hash-object --literally" and a potential garbage type. Let's have
> the primary API use an "enum object_type", and define an *_extended()
> function that can take an arbitrary "const char *" for the type.

I recently completed a PR in the Git for Windows build that is focused on
"git hash-object --literally" as a starter for LLP64 large file (>4GB)
compatibility.
(https://github.com/git-for-windows/git/pull/3533), which Dscho has
merged (cc'd).

I'm not sure that the `extended` version will work as expected across
the test suite
as multiple fake object types are tried, though I only skimmed the patch.

I'd support the general thrust, but just wanted to synchronise any changes.

Philip
>
> See [1] for the discussion that prompted this patch, i.e. new code in
> object-file.c that wanted to copy/paste the xsnprintf() invocation.
>
> 1. https://lore.kernel.org/git/211213.86bl1l9bfz.gmgdl@evledraar.gmail.com/
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>
> On Fri, Dec 17 2021, Han Xin wrote:
>
>> From: Han Xin <hanxin.hx@alibaba-inc.com>
>>
>> There are 3 places where "xsnprintf" is used to generate the object
>> header, and I originally planned to add a fourth in the latter patch.
>>
>> According to Ævar Arnfjörð Bjarmason’s suggestion, although it's just
>> one line, it's also code that's very central to git, so reafactor them
>> into a function which will help later readability.
>>
>> Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> Signed-off-by: Han Xin <hanxin.hx@alibaba-inc.com>
> I came up with this after my comment on the earlier round suggesting
> to factor out that header formatting. I don't know if this more
> thorough approach is worth it or if you'd like to replace your change
> with this one, but just posting it here as an RFC.
>
>  builtin/index-pack.c |  3 +--
>  bulk-checkin.c       |  4 ++--
>  cache.h              | 21 +++++++++++++++++++++
>  http-push.c          |  2 +-
>  object-file.c        | 14 +++++++++++---
>  5 files changed, 36 insertions(+), 8 deletions(-)
>
> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index c23d01de7dc..900c6539f68 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -449,8 +449,7 @@ static void *unpack_entry_data(off_t offset, unsigned long size,
>  	int hdrlen;
>  
>  	if (!is_delta_type(type)) {
> -		hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %"PRIuMAX,
> -				   type_name(type),(uintmax_t)size) + 1;
> +		hdrlen = format_loose_header(hdr, sizeof(hdr), type, (uintmax_t)size);
>  		the_hash_algo->init_fn(&c);
>  		the_hash_algo->update_fn(&c, hdr, hdrlen);
>  	} else
> diff --git a/bulk-checkin.c b/bulk-checkin.c
> index 8785b2ac806..446dea7c516 100644
> --- a/bulk-checkin.c
> +++ b/bulk-checkin.c
> @@ -220,8 +220,8 @@ static int deflate_to_pack(struct bulk_checkin_state *state,
>  	if (seekback == (off_t) -1)
>  		return error("cannot find the current offset");
>  
> -	header_len = xsnprintf((char *)obuf, sizeof(obuf), "%s %" PRIuMAX,
> -			       type_name(type), (uintmax_t)size) + 1;
> +	header_len = format_loose_header((char *)obuf, sizeof(obuf),
> +					 type, (uintmax_t)size);
>  	the_hash_algo->init_fn(&ctx);
>  	the_hash_algo->update_fn(&ctx, obuf, header_len);
>  
> diff --git a/cache.h b/cache.h
> index d5cafba17d4..ccece21a4a2 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1309,6 +1309,27 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
>  						    unsigned long bufsiz,
>  						    struct strbuf *hdrbuf);
>  
> +/**
> + * format_loose_header() is a thin wrapper around s xsnprintf() that
> + * writes the initial "<type> <obj-len>" part of the loose object
> + * header. It returns the size that snprintf() returns + 1.
> + *
> + * The format_loose_header_extended() function allows for writing a
> + * type_name that's not one of the "enum object_type" types. This is
> + * used for "git hash-object --literally". Pass in a OBJ_NONE as the
> + * type, and a non-NULL "type_str" to do that.
> + *
> + * format_loose_header() is a convenience wrapper for
> + * format_loose_header_extended().
> + */
> +int format_loose_header_extended(char *str, size_t size, enum object_type type,
> +				 const char *type_str, size_t objsize);
> +static inline int format_loose_header(char *str, size_t size,
> +				      enum object_type type, size_t objsize)
> +{
> +	return format_loose_header_extended(str, size, type, NULL, objsize);
> +}
> +
>  /**
>   * parse_loose_header() parses the starting "<type> <len>\0" of an
>   * object. If it doesn't follow that format -1 is returned. To check
> diff --git a/http-push.c b/http-push.c
> index 3309aaf004a..d1a8619e0af 100644
> --- a/http-push.c
> +++ b/http-push.c
> @@ -363,7 +363,7 @@ static void start_put(struct transfer_request *request)
>  	git_zstream stream;
>  
>  	unpacked = read_object_file(&request->obj->oid, &type, &len);
> -	hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %"PRIuMAX , type_name(type), (uintmax_t)len) + 1;
> +	hdrlen = format_loose_header(hdr, sizeof(hdr), type, (uintmax_t)len);
>  
>  	/* Set it up */
>  	git_deflate_init(&stream, zlib_compression_level);
> diff --git a/object-file.c b/object-file.c
> index eac67f6f5f9..d94609ee48d 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1009,6 +1009,14 @@ void *xmmap(void *start, size_t length,
>  	return ret;
>  }
>  
> +int format_loose_header_extended(char *str, size_t size, enum object_type type,
> +				 const char *typestr, size_t objsize)
> +{
> +	const char *s = type == OBJ_NONE ? typestr : type_name(type);
> +
> +	return xsnprintf(str, size, "%s %"PRIuMAX, s, (uintmax_t)objsize) + 1;
> +}
> +
>  /*
>   * With an in-core object data in "map", rehash it to make sure the
>   * object name actually matches "oid" to detect object corruption.
> @@ -1037,7 +1045,7 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
>  		return -1;
>  
>  	/* Generate the header */
> -	hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %"PRIuMAX , type_name(obj_type), (uintmax_t)size) + 1;
> +	hdrlen = format_loose_header(hdr, sizeof(hdr), obj_type, size);
>  
>  	/* Sha1.. */
>  	r->hash_algo->init_fn(&c);
> @@ -1737,7 +1745,7 @@ static void write_object_file_prepare(const struct git_hash_algo *algo,
>  	git_hash_ctx c;
>  
>  	/* Generate the header */
> -	*hdrlen = xsnprintf(hdr, *hdrlen, "%s %"PRIuMAX , type, (uintmax_t)len)+1;
> +	*hdrlen = format_loose_header_extended(hdr, *hdrlen, OBJ_NONE, type, len);
>  
>  	/* Sha1.. */
>  	algo->init_fn(&c);
> @@ -2009,7 +2017,7 @@ int force_object_loose(const struct object_id *oid, time_t mtime)
>  	buf = read_object(the_repository, oid, &type, &len);
>  	if (!buf)
>  		return error(_("cannot read object for %s"), oid_to_hex(oid));
> -	hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %"PRIuMAX , type_name(type), (uintmax_t)len) + 1;
> +	hdrlen = format_loose_header(hdr, sizeof(hdr), type, len);
>  	ret = write_loose_object(oid, hdr, hdrlen, buf, len, mtime, 0);
>  	free(buf);
>
Junio C Hamano Dec. 20, 2021, 10:25 p.m. UTC | #2
Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Add a convenience function to wrap the xsnprintf() command that
> generates loose object headers. This code was copy/pasted in various
> parts of the codebase, let's define it in one place and re-use it from
> there.
> ...
> +/**
> + * format_loose_header() is a thin wrapper around s xsnprintf() that

The name should have "object" somewhere in it.  Not all readers can
be expected to know that you meant "loose" to be an acceptable short
hand for "loose object".

That nit aside, I think it is a good idea to give people a common
helper function to call.  I am undecided if it is a good idea to
make it take enum or "const char *"; most everybody should be able
to say

	format_object_header(type_name(OBJ_COMMIT), ...)

just fine, so two variants might be overkill, just to allow 

	format_object_header(OBJ_COMMIT, ...)

and to forbid

	format_object_header("connit", ...)

I dunno.
Ævar Arnfjörð Bjarmason Dec. 21, 2021, 1:42 a.m. UTC | #3
On Mon, Dec 20 2021, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> Add a convenience function to wrap the xsnprintf() command that
>> generates loose object headers. This code was copy/pasted in various
>> parts of the codebase, let's define it in one place and re-use it from
>> there.
>> ...
>> +/**
>> + * format_loose_header() is a thin wrapper around s xsnprintf() that
>
> The name should have "object" somewhere in it.  Not all readers can
> be expected to know that you meant "loose" to be an acceptable short
> hand for "loose object".

*nod*

> That nit aside, I think it is a good idea to give people a common
> helper function to call.  I am undecided if it is a good idea to
> make it take enum or "const char *"; most everybody should be able
> to say
>
> 	format_object_header(type_name(OBJ_COMMIT), ...)
>
> just fine, so two variants might be overkill, just to allow 
>
> 	format_object_header(OBJ_COMMIT, ...)
>
> and to forbid
>
> 	format_object_header("connit", ...)
>
> I dunno.

Ultimately only a single API caller in hash-object.c really cares about
something else than the enum.

I've got some patches locally to convert e.g. write_object_file() to use
the enum, and it removes the need for some callers to convert enum to
char *, only to have other things convert it back.

So I think for any new APIs it makes sense to work towards sidelining
the hash-object.c --literally caller.
Junio C Hamano Dec. 21, 2021, 2:11 a.m. UTC | #4
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> I've got some patches locally to convert e.g. write_object_file() to use
> the enum, and it removes the need for some callers to convert enum to
> char *, only to have other things convert it back.
>
> So I think for any new APIs it makes sense to work towards sidelining
> the hash-object.c --literally caller.

Your logic is backwards to argue "because I did something this way,
it makes sense to do it this way"?
Ævar Arnfjörð Bjarmason Dec. 21, 2021, 2:27 a.m. UTC | #5
On Mon, Dec 20 2021, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> I've got some patches locally to convert e.g. write_object_file() to use
>> the enum, and it removes the need for some callers to convert enum to
>> char *, only to have other things convert it back.
>>
>> So I think for any new APIs it makes sense to work towards sidelining
>> the hash-object.c --literally caller.
>
> Your logic is backwards to argue "because I did something this way,
> it makes sense to do it this way"?

No, it's that if you look at the write_object_file() and
hash_object_file() callers in-tree now many, including in object-file.c
itself are taking an "enum object_type" only to convert it to a string,
and then we'll in turn sometimes convert that to the "enum object_type"
again at some lower level.

That API inconsistency dates back to at least Linus's a733cb606fe
(Change pack file format. Hopefully for the last time., 2005-06-28).

I'm just pointing out that I have local patches that prove that a lot of
back & forth is done for no good reason, and that this is one of the
codepaths that's tangentally involved. So it makes sense in this case to
make any new API take "enum object_type" as the primary interface.
Han Xin Dec. 21, 2021, 11:43 a.m. UTC | #6
On Mon, Dec 20, 2021 at 8:10 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> I came up with this after my comment on the earlier round suggesting
> to factor out that header formatting. I don't know if this more
> thorough approach is worth it or if you'd like to replace your change
> with this one, but just posting it here as an RFC.
>

I will take this patch and rename the function name from
"format_loose_header()" to "format_object_header()".

Thanks
-Han Xin
diff mbox series

Patch

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index c23d01de7dc..900c6539f68 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -449,8 +449,7 @@  static void *unpack_entry_data(off_t offset, unsigned long size,
 	int hdrlen;
 
 	if (!is_delta_type(type)) {
-		hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %"PRIuMAX,
-				   type_name(type),(uintmax_t)size) + 1;
+		hdrlen = format_loose_header(hdr, sizeof(hdr), type, (uintmax_t)size);
 		the_hash_algo->init_fn(&c);
 		the_hash_algo->update_fn(&c, hdr, hdrlen);
 	} else
diff --git a/bulk-checkin.c b/bulk-checkin.c
index 8785b2ac806..446dea7c516 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -220,8 +220,8 @@  static int deflate_to_pack(struct bulk_checkin_state *state,
 	if (seekback == (off_t) -1)
 		return error("cannot find the current offset");
 
-	header_len = xsnprintf((char *)obuf, sizeof(obuf), "%s %" PRIuMAX,
-			       type_name(type), (uintmax_t)size) + 1;
+	header_len = format_loose_header((char *)obuf, sizeof(obuf),
+					 type, (uintmax_t)size);
 	the_hash_algo->init_fn(&ctx);
 	the_hash_algo->update_fn(&ctx, obuf, header_len);
 
diff --git a/cache.h b/cache.h
index d5cafba17d4..ccece21a4a2 100644
--- a/cache.h
+++ b/cache.h
@@ -1309,6 +1309,27 @@  enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
 						    unsigned long bufsiz,
 						    struct strbuf *hdrbuf);
 
+/**
+ * format_loose_header() is a thin wrapper around s xsnprintf() that
+ * writes the initial "<type> <obj-len>" part of the loose object
+ * header. It returns the size that snprintf() returns + 1.
+ *
+ * The format_loose_header_extended() function allows for writing a
+ * type_name that's not one of the "enum object_type" types. This is
+ * used for "git hash-object --literally". Pass in a OBJ_NONE as the
+ * type, and a non-NULL "type_str" to do that.
+ *
+ * format_loose_header() is a convenience wrapper for
+ * format_loose_header_extended().
+ */
+int format_loose_header_extended(char *str, size_t size, enum object_type type,
+				 const char *type_str, size_t objsize);
+static inline int format_loose_header(char *str, size_t size,
+				      enum object_type type, size_t objsize)
+{
+	return format_loose_header_extended(str, size, type, NULL, objsize);
+}
+
 /**
  * parse_loose_header() parses the starting "<type> <len>\0" of an
  * object. If it doesn't follow that format -1 is returned. To check
diff --git a/http-push.c b/http-push.c
index 3309aaf004a..d1a8619e0af 100644
--- a/http-push.c
+++ b/http-push.c
@@ -363,7 +363,7 @@  static void start_put(struct transfer_request *request)
 	git_zstream stream;
 
 	unpacked = read_object_file(&request->obj->oid, &type, &len);
-	hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %"PRIuMAX , type_name(type), (uintmax_t)len) + 1;
+	hdrlen = format_loose_header(hdr, sizeof(hdr), type, (uintmax_t)len);
 
 	/* Set it up */
 	git_deflate_init(&stream, zlib_compression_level);
diff --git a/object-file.c b/object-file.c
index eac67f6f5f9..d94609ee48d 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1009,6 +1009,14 @@  void *xmmap(void *start, size_t length,
 	return ret;
 }
 
+int format_loose_header_extended(char *str, size_t size, enum object_type type,
+				 const char *typestr, size_t objsize)
+{
+	const char *s = type == OBJ_NONE ? typestr : type_name(type);
+
+	return xsnprintf(str, size, "%s %"PRIuMAX, s, (uintmax_t)objsize) + 1;
+}
+
 /*
  * With an in-core object data in "map", rehash it to make sure the
  * object name actually matches "oid" to detect object corruption.
@@ -1037,7 +1045,7 @@  int check_object_signature(struct repository *r, const struct object_id *oid,
 		return -1;
 
 	/* Generate the header */
-	hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %"PRIuMAX , type_name(obj_type), (uintmax_t)size) + 1;
+	hdrlen = format_loose_header(hdr, sizeof(hdr), obj_type, size);
 
 	/* Sha1.. */
 	r->hash_algo->init_fn(&c);
@@ -1737,7 +1745,7 @@  static void write_object_file_prepare(const struct git_hash_algo *algo,
 	git_hash_ctx c;
 
 	/* Generate the header */
-	*hdrlen = xsnprintf(hdr, *hdrlen, "%s %"PRIuMAX , type, (uintmax_t)len)+1;
+	*hdrlen = format_loose_header_extended(hdr, *hdrlen, OBJ_NONE, type, len);
 
 	/* Sha1.. */
 	algo->init_fn(&c);
@@ -2009,7 +2017,7 @@  int force_object_loose(const struct object_id *oid, time_t mtime)
 	buf = read_object(the_repository, oid, &type, &len);
 	if (!buf)
 		return error(_("cannot read object for %s"), oid_to_hex(oid));
-	hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %"PRIuMAX , type_name(type), (uintmax_t)len) + 1;
+	hdrlen = format_loose_header(hdr, sizeof(hdr), type, len);
 	ret = write_loose_object(oid, hdr, hdrlen, buf, len, mtime, 0);
 	free(buf);