diff mbox series

[v2,6/6] trailer: use offsets for trailer_start/trailer_end

Message ID 0463066ebe0889b72b6a1f6c344f2de127458391.1694240177.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Commit b5e75f87b5881811a7eee8ea6aa10cd43f1fe430
Headers show
Series Trailer readability cleanups | expand

Commit Message

Linus Arver Sept. 9, 2023, 6:16 a.m. UTC
From: Linus Arver <linusa@google.com>

Previously these fields in the trailer_info struct were of type "const
char *" and pointed to positions in the input string directly (to the
start and end positions of the trailer block).

Use offsets to make the intended usage less ambiguous. We only need to
reference the input string in format_trailer_info(), so update that
function to take a pointer to the input.

Signed-off-by: Linus Arver <linusa@google.com>
---
 trailer.c | 17 ++++++++---------
 trailer.h |  7 +++----
 2 files changed, 11 insertions(+), 13 deletions(-)

Comments

Junio C Hamano Sept. 11, 2023, 7:01 p.m. UTC | #1
"Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Linus Arver <linusa@google.com>
>
> Previously these fields in the trailer_info struct were of type "const
> char *" and pointed to positions in the input string directly (to the
> start and end positions of the trailer block).
>
> Use offsets to make the intended usage less ambiguous. We only need to
> reference the input string in format_trailer_info(), so update that
> function to take a pointer to the input.

Hmm, I am not sure if this is an improvement.  If the underlying
buffer can be reallocated (to grow), the approach to use the offsets
certainly is easier to deal with, as they will stay valid even after
such a reallocation.  But you lose the obvious sentinel value NULL
that can mean something special, and have to make the readers aware
of the local convention you happened to have picked with a comment
like ...

> Signed-off-by: Linus Arver <linusa@google.com>
> ---
>  trailer.c | 17 ++++++++---------
>  trailer.h |  7 +++----
>  2 files changed, 11 insertions(+), 13 deletions(-)
> ...
>  	/*
> -	 * Pointers to the start and end of the trailer block found. If there
> -	 * is no trailer block found, these 2 pointers point to the end of the
> -	 * input string.
> +	 * Offsets to the trailer block start and end positions in the input
> +	 * string. If no trailer block is found, these are set to 0.
>  	 */

... this, simply because there is no obvious sentinel value for an
unsigned integral type; even if you count MAX_ULONG and its friends,
they are not as obvious as NULL for pointer types.

So, I dunno.
Linus Arver Sept. 14, 2023, 1:21 a.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> "Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Linus Arver <linusa@google.com>
>>
>> Previously these fields in the trailer_info struct were of type "const
>> char *" and pointed to positions in the input string directly (to the
>> start and end positions of the trailer block).
>>
>> Use offsets to make the intended usage less ambiguous. We only need to
>> reference the input string in format_trailer_info(), so update that
>> function to take a pointer to the input.
>
> Hmm, I am not sure if this is an improvement.  If the underlying
> buffer can be reallocated (to grow), the approach to use the offsets
> certainly is easier to deal with, as they will stay valid even after
> such a reallocation.  But you lose the obvious sentinel value NULL
> that can mean something special

True.

> and have to make the readers aware
> of the local convention you happened to have picked with a comment
> like ...
>
>> Signed-off-by: Linus Arver <linusa@google.com>
>> ---
>>  trailer.c | 17 ++++++++---------
>>  trailer.h |  7 +++----
>>  2 files changed, 11 insertions(+), 13 deletions(-)
>> ...
>>  	/*
>> -	 * Pointers to the start and end of the trailer block found. If there
>> -	 * is no trailer block found, these 2 pointers point to the end of the
>> -	 * input string.
>> +	 * Offsets to the trailer block start and end positions in the input
>> +	 * string. If no trailer block is found, these are set to 0.
>>  	 */
>
> ... this, simply because there is no obvious sentinel value for an
> unsigned integral type; even if you count MAX_ULONG and its friends,
> they are not as obvious as NULL for pointer types.

I agree that there is no trustworthy sentinel value for an unsigned
integral type.

On the other hand, we never used NULL as a sentinel value before even
when they were const char pointers --- the current comment for these
fields which say ...

    If there is no trailer block found, these 2 pointers point to the end of the
    input string.

... sounds somewhat arbitrary to me (and I don't think we care about
this property in trailer.c, and AFAICS it's also not something that
consumers should be aware of). Consumers of the trailer_info struct
could also just see if "info->trailer_nr > 0" to check whether any
trailers were found, although if we're merging Patch 1 [1] of this
series the consumers will not have easy access any more to any of the
trailer_info fields, and they should instead be using a public-facing
function that does the "were trailers found" check.

> So, I dunno.

If the "we don't use NULL sentinel values currently anyway" argument is
convincing enough, I'm happy to add this to the commit message on a
reroll. But I'm also OK with dropping this patch. Thoughts?

[1] https://lore.kernel.org/git/pull.1563.git.1691211879.gitgitgadget@gmail.com/T/#m8f1b5f1eb346331658c8c7b3e057a4ee31223664
Linus Arver Sept. 14, 2023, 3:18 a.m. UTC | #3
Linus Arver <linusa@google.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> "Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>>> From: Linus Arver <linusa@google.com>
>>>
>>> Previously these fields in the trailer_info struct were of type "const
>>> char *" and pointed to positions in the input string directly (to the
>>> start and end positions of the trailer block).
>>>
>>> Use offsets to make the intended usage less ambiguous. We only need to
>>> reference the input string in format_trailer_info(), so update that
>>> function to take a pointer to the input.
>>
>> Hmm, I am not sure if this is an improvement.  If the underlying
>> buffer can be reallocated (to grow), the approach to use the offsets
>> certainly is easier to deal with, as they will stay valid even after
>> such a reallocation.  But you lose the obvious sentinel value NULL
>> that can mean something special
>
> True.
>
>> and have to make the readers aware
>> of the local convention you happened to have picked with a comment
>> like ...
>>
>>> Signed-off-by: Linus Arver <linusa@google.com>
>>> ---
>>>  trailer.c | 17 ++++++++---------
>>>  trailer.h |  7 +++----
>>>  2 files changed, 11 insertions(+), 13 deletions(-)
>>> ...
>>>  	/*
>>> -	 * Pointers to the start and end of the trailer block found. If there
>>> -	 * is no trailer block found, these 2 pointers point to the end of the
>>> -	 * input string.
>>> +	 * Offsets to the trailer block start and end positions in the input
>>> +	 * string. If no trailer block is found, these are set to 0.
>>>  	 */

I've just realized that the new comment "If no trailer block is found,
these are set to 0" is perhaps not always true in the current version
of this patch. This is because this patch did a mechanical type
conversion of the old fields from "const char *" to "size_t", and we
still run

    trailer_end = find_trailer_end(str, trailer_search_boundary);
    trailer_start = find_trailer_start(str, trailer_end);

inside trailer_info_get() (not visible in the patch context lines). I
will update this patch to make the comment "If no trailer block is found,
these are set to 0" true.
diff mbox series

Patch

diff --git a/trailer.c b/trailer.c
index 6ad2fbca942..00326720e81 100644
--- a/trailer.c
+++ b/trailer.c
@@ -1055,7 +1055,6 @@  void process_trailers(const char *file,
 	LIST_HEAD(head);
 	struct strbuf sb = STRBUF_INIT;
 	struct trailer_info info;
-	size_t trailer_end;
 	FILE *outfile = stdout;
 
 	ensure_configured();
@@ -1066,11 +1065,10 @@  void process_trailers(const char *file,
 		outfile = create_in_place_tempfile(file);
 
 	parse_trailers(&info, sb.buf, &head, opts);
-	trailer_end = info.trailer_end - sb.buf;
 
 	/* Print the lines before the trailers */
 	if (!opts->only_trailers)
-		fwrite(sb.buf, 1, info.trailer_start - sb.buf, outfile);
+		fwrite(sb.buf, 1, info.trailer_start, outfile);
 
 	if (!opts->only_trailers && !info.blank_line_before_trailer)
 		fprintf(outfile, "\n");
@@ -1092,7 +1090,7 @@  void process_trailers(const char *file,
 
 	/* Print the lines after the trailers as is */
 	if (!opts->only_trailers)
-		fwrite(sb.buf + trailer_end, 1, sb.len - trailer_end, outfile);
+		fwrite(sb.buf + info.trailer_end, 1, sb.len - info.trailer_end, outfile);
 
 	if (opts->in_place)
 		if (rename_tempfile(&trailers_tempfile, file))
@@ -1104,7 +1102,7 @@  void process_trailers(const char *file,
 void trailer_info_get(struct trailer_info *info, const char *str,
 		      const struct process_trailer_options *opts)
 {
-	int patch_start, trailer_end, trailer_start;
+	size_t patch_start, trailer_end = 0, trailer_start = 0;
 	struct strbuf **trailer_lines, **ptr;
 	char **trailer_strings = NULL;
 	size_t nr = 0, alloc = 0;
@@ -1139,8 +1137,8 @@  void trailer_info_get(struct trailer_info *info, const char *str,
 
 	info->blank_line_before_trailer = ends_with_blank_line(str,
 							       trailer_start);
-	info->trailer_start = str + trailer_start;
-	info->trailer_end = str + trailer_end;
+	info->trailer_start = trailer_start;
+	info->trailer_end = trailer_end;
 	info->trailers = trailer_strings;
 	info->trailer_nr = nr;
 }
@@ -1155,6 +1153,7 @@  void trailer_info_release(struct trailer_info *info)
 
 static void format_trailer_info(struct strbuf *out,
 				const struct trailer_info *info,
+				const char *msg,
 				const struct process_trailer_options *opts)
 {
 	size_t origlen = out->len;
@@ -1164,7 +1163,7 @@  static void format_trailer_info(struct strbuf *out,
 	if (!opts->only_trailers && !opts->unfold && !opts->filter &&
 	    !opts->separator && !opts->key_only && !opts->value_only &&
 	    !opts->key_value_separator) {
-		strbuf_add(out, info->trailer_start,
+		strbuf_add(out, msg + info->trailer_start,
 			   info->trailer_end - info->trailer_start);
 		return;
 	}
@@ -1219,7 +1218,7 @@  void format_trailers_from_commit(struct strbuf *out, const char *msg,
 	struct trailer_info info;
 
 	trailer_info_get(&info, msg, opts);
-	format_trailer_info(out, &info, opts);
+	format_trailer_info(out, &info, msg, opts);
 	trailer_info_release(&info);
 }
 
diff --git a/trailer.h b/trailer.h
index a689d768c79..13fbf0dcd12 100644
--- a/trailer.h
+++ b/trailer.h
@@ -37,11 +37,10 @@  struct trailer_info {
 	int blank_line_before_trailer;
 
 	/*
-	 * Pointers to the start and end of the trailer block found. If there
-	 * is no trailer block found, these 2 pointers point to the end of the
-	 * input string.
+	 * Offsets to the trailer block start and end positions in the input
+	 * string. If no trailer block is found, these are set to 0.
 	 */
-	const char *trailer_start, *trailer_end;
+	size_t trailer_start, trailer_end;
 
 	/*
 	 * Array of trailers found.