diff mbox series

wt-status: Don't find scissors line beyond buf len

Message ID 20240307183743.219951-1-flosch@nutanix.com (mailing list archive)
State New
Headers show
Series wt-status: Don't find scissors line beyond buf len | expand

Commit Message

Florian Schmidt March 7, 2024, 6:37 p.m. UTC
Currently, if
(a) There is a "---" divider in a commit message,
(b) At some point beyond that divider, there is a cut-line (that is,
    "# ------------------------ >8 ------------------------") in the
    commit message,
(c) the user does not explicitly set the "no-divider" option,
then "git interpret-trailers" will hang indefinitively.

This is because when (a) is true, find_end_of_log_message() will invoke
ignored_log_message_bytes() with a len that is intended to make it
ignore the part of the commit message beyond the divider. However,
ignored_log_message_bytes() calls wt_status_locate_end(), and that
function ignores the length restriction when it tries to locate the cut
line. If it manages to find one, the returned cutoff value is greater
than len. At this point, ignored_log_message_bytes() goes into an
infinite loop, because it won't advance the string parsing beyond len,
but the exit condition expects to reach cutoff.

It seems sensible to expect that wt_status_locate_end() should honour
the length parameter passed in, and doing so fixes this issue.

Signed-off-by: Florian Schmidt <flosch@nutanix.com>
Reviewed-by: Jonathan Davies <jonathan.davies@nutanix.com>
---

Side remark: Since strstr() doesn't consider len, and will always search
up to a null byte, I now wonder whether it would be safer to create a
new strbuf that only contains the len bytes we want to operate on. If
anybody ever thinks they can pass a non-null-terminated string into
wt_status_locate_end() because they already provide a len parameter,
they will not have a good time. So it's that traded off against the
slightly higher overhead of creating yet another buffer and copying a
potentially large-ish commit message around.


 t/t7513-interpret-trailers.sh | 14 ++++++++++++++
 wt-status.c                   | 13 +++++++++----
 2 files changed, 23 insertions(+), 4 deletions(-)

Comments

Junio C Hamano March 7, 2024, 7:20 p.m. UTC | #1
Florian Schmidt <flosch@nutanix.com> writes:

> Currently, if
> (a) There is a "---" divider in a commit message,
> (b) At some point beyond that divider, there is a cut-line (that is,
>     "# ------------------------ >8 ------------------------") in the
>     commit message,
> (c) the user does not explicitly set the "no-divider" option,
> then "git interpret-trailers" will hang indefinitively.

You do not have to say "Currently, if"; just "If" is sufficient.
Cf. Documentation/SubmittingPatches[[present-tense]]

> This is because when (a) is true, find_end_of_log_message() will invoke
> ignored_log_message_bytes() with a len that is intended to make it
> ignore the part of the commit message beyond the divider. However,
> ignored_log_message_bytes() calls wt_status_locate_end(), and that
> function ignores the length restriction when it tries to locate the cut
> line. If it manages to find one, the returned cutoff value is greater
> than len. At this point, ignored_log_message_bytes() goes into an
> infinite loop, because it won't advance the string parsing beyond len,
> but the exit condition expects to reach cutoff.

Good finding.  

> It seems sensible to expect that wt_status_locate_end() should honour
> the length parameter passed in, and doing so fixes this issue.

Thanks.  This is an ancient bug, not a retression from recent
changes to the trailer library [linusa CC'ed to save him from
wasting his time wondering if he broke anything].

> diff --git a/wt-status.c b/wt-status.c
> index b5a29083df..51a84575ed 100644
> --- a/wt-status.c
> +++ b/wt-status.c
> @@ -1089,14 +1089,19 @@ size_t wt_status_locate_end(const char *s, size_t len)
>  {
>  	const char *p;
>  	struct strbuf pattern = STRBUF_INIT;
> +	size_t result = len;
>  
>  	strbuf_addf(&pattern, "\n%c %s", comment_line_char, cut_line);
>  	if (starts_with(s, pattern.buf + 1))
> -		len = 0;
> -	else if ((p = strstr(s, pattern.buf)))
> -		len = p - s + 1;
> +		result = 0;
> +	else if ((p = strstr(s, pattern.buf))) {
> +		result = p - s + 1;
> +		if (result > len) {
> +			result = len;
> +		}
> +	}
>  	strbuf_release(&pattern);
> -	return len;
> +	return result;
>  }

Looks correct, but we probably can make the fix a lot more isolated
into a single block, like the attached patch.  How does this look?

 wt-status.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git c/wt-status.c w/wt-status.c
index b5a29083df..511f37cfe0 100644
--- c/wt-status.c
+++ w/wt-status.c
@@ -1093,8 +1093,11 @@ size_t wt_status_locate_end(const char *s, size_t len)
 	strbuf_addf(&pattern, "\n%c %s", comment_line_char, cut_line);
 	if (starts_with(s, pattern.buf + 1))
 		len = 0;
-	else if ((p = strstr(s, pattern.buf)))
-		len = p - s + 1;
+	else if ((p = strstr(s, pattern.buf))) {
+		int newlen = p - s + 1;
+		if (newlen < len)
+			len = newlen;
+	}
 	strbuf_release(&pattern);
 	return len;
 }
Junio C Hamano March 7, 2024, 7:35 p.m. UTC | #2
Florian Schmidt <flosch@nutanix.com> writes:

> Side remark: Since strstr() doesn't consider len, and will always search
> up to a null byte, I now wonder whether it would be safer to create a
> new strbuf that only contains the len bytes we want to operate on.

That is a valid concern in general, but does not seem to apply to
the current codebase.  Thanks for being careful.

Two of the three callers of wt_status_locate_end() feed the pointer
into a piece of memory that is owned by strbuf, which guarantees
that the memory has an extra NUL to terminate it as a string even if
you did

	strbuf buf = STRBUF_INIT;
	strbuf_addch(&buf, 'A');

The other one is in commit.c:ignored_log_message_bytes() that still
takes <buf, len> as input, but again, two of its three callers call
it with a pointer that points at the beginning of memory held by an
instance of strbuf.

That leaves us trailer.c:find_end_of_log_message() the only one to
worry about, but it uses strlen() on the pointer before calling
ignored_log_message_bytes() so the region of the memory pointed at
by the pointer is assumed to be NUL-terminated already, and
presumably (I didn't follow the logic there too closely) the length
is also computed within that NUL-terminated string.
Junio C Hamano March 7, 2024, 8:47 p.m. UTC | #3
From: Florian Schmidt <flosch@nutanix.com>
Date: Thu, 7 Mar 2024 18:37:38 +0000
Subject: [PATCH] wt-status: don't find scissors line beyond buf len

If

  (a) There is a "---" divider in a commit message,

  (b) At some point beyond that divider, there is a cut-line (that is,
      "# ------------------------ >8 ------------------------") in the
      commit message,

  (c) the user does not explicitly set the "no-divider" option,

then "git interpret-trailers" will hang indefinitively.

This is because when (a) is true, find_end_of_log_message() will invoke
ignored_log_message_bytes() with a len that is intended to make it
ignore the part of the commit message beyond the divider. However,
ignored_log_message_bytes() calls wt_status_locate_end(), and that
function ignores the length restriction when it tries to locate the cut
line. If it manages to find one, the returned cutoff value is greater
than len. At this point, ignored_log_message_bytes() goes into an
infinite loop, because it won't advance the string parsing beyond len,
but the exit condition expects to reach cutoff.

Make wt_status_locate_end() honor the length parameter passed in, to
fix this issue.

In general, if wt_status_locate_end() is given a piece of the memory
that lacks NUL at all, strstr() may continue across page boundaries
and run into an unmapped page.  For our current callers, this is not
a problem, as all of them except one uses a memory owned by a strbuf
(which guarantees an implicit NUL-termination after its payload),
and the one exeption in trailer.c:find_end_of_log_message() uses
strlen() to compute the length before calling this function.

Signed-off-by: Florian Schmidt <flosch@nutanix.com>
Reviewed-by: Jonathan Davies <jonathan.davies@nutanix.com>
[jc: tweaked the commit log message and the implementation a bit]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * So here is the version I queued.  I have a new paragraph at the
   end of the log message to talk about use of strstr() and how it
   is OK in the current codebase.

 t/t7513-interpret-trailers.sh | 14 ++++++++++++++
 wt-status.c                   |  7 +++++--
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh
index 6602790b5f..5efe70d675 100755
--- a/t/t7513-interpret-trailers.sh
+++ b/t/t7513-interpret-trailers.sh
@@ -1476,4 +1476,18 @@ test_expect_success 'suppress --- handling' '
 	test_cmp expected actual
 '
 
+test_expect_success 'handling of --- lines in conjunction with cut-lines' '
+	echo "my-trailer: here" >expected &&
+
+	git interpret-trailers --parse >actual <<-\EOF &&
+	subject
+
+	my-trailer: here
+	---
+	# ------------------------ >8 ------------------------
+	EOF
+
+	test_cmp expected actual
+'
+
 test_done
diff --git a/wt-status.c b/wt-status.c
index 40b59be478..16c1b9b7ee 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1007,8 +1007,11 @@ size_t wt_status_locate_end(const char *s, size_t len)
 	strbuf_addf(&pattern, "\n%c %s", comment_line_char, cut_line);
 	if (starts_with(s, pattern.buf + 1))
 		len = 0;
-	else if ((p = strstr(s, pattern.buf)))
-		len = p - s + 1;
+	else if ((p = strstr(s, pattern.buf))) {
+		size_t newlen = p - s + 1;
+		if (newlen < len)
+			len = newlen;
+	}
 	strbuf_release(&pattern);
 	return len;
 }
Eric Sunshine March 7, 2024, 9:09 p.m. UTC | #4
On Thu, Mar 7, 2024 at 3:47 PM Junio C Hamano <gitster@pobox.com> wrote:
>  * So here is the version I queued.  I have a new paragraph at the
>    end of the log message to talk about use of strstr() and how it
>    is OK in the current codebase.
> [jc: tweaked the commit log message and the implementation a bit]
>
> From: Florian Schmidt <flosch@nutanix.com>
>
> In general, if wt_status_locate_end() is given a piece of the memory
> that lacks NUL at all, strstr() may continue across page boundaries
> and run into an unmapped page.  For our current callers, this is not
> a problem, as all of them except one uses a memory owned by a strbuf
> (which guarantees an implicit NUL-termination after its payload),
> and the one exeption in trailer.c:find_end_of_log_message() uses
> strlen() to compute the length before calling this function.

s/exeption/exception/
Kristoffer Haugsbakk March 7, 2024, 9:15 p.m. UTC | #5
On Thu, Mar 7, 2024, at 21:47, Junio C Hamano wrote:
> Signed-off-by: Florian Schmidt <flosch@nutanix.com>
> Reviewed-by: Jonathan Davies <jonathan.davies@nutanix.com>
> [jc: tweaked the commit log message and the implementation a bit]

Just a question. Given the imperative mood principle/rule, why are these
bracket changelog lines always written in the past tense?

Cheers
Junio C Hamano March 7, 2024, 9:24 p.m. UTC | #6
"Kristoffer Haugsbakk" <code@khaugsbakk.name> writes:

> On Thu, Mar 7, 2024, at 21:47, Junio C Hamano wrote:
>> Signed-off-by: Florian Schmidt <flosch@nutanix.com>
>> Reviewed-by: Jonathan Davies <jonathan.davies@nutanix.com>
>> [jc: tweaked the commit log message and the implementation a bit]
>
> Just a question. Given the imperative mood principle/rule, why are these
> bracket changelog lines always written in the past tense?

These are not giving orders to the code to become like so.  The
trailer block records what happend to the patch in chronological
order---think of those written there at one level higher level,
"meta" comments.
Eric Sunshine March 7, 2024, 9:26 p.m. UTC | #7
On Thu, Mar 7, 2024 at 4:24 PM Junio C Hamano <gitster@pobox.com> wrote:
> "Kristoffer Haugsbakk" <code@khaugsbakk.name> writes:
> > On Thu, Mar 7, 2024, at 21:47, Junio C Hamano wrote:
> >> [jc: tweaked the commit log message and the implementation a bit]
> >
> > Just a question. Given the imperative mood principle/rule, why are these
> > bracket changelog lines always written in the past tense?
>
> These are not giving orders to the code to become like so.  The
> trailer block records what happend to the patch in chronological
> order---think of those written there at one level higher level,
> "meta" comments.

Also, they are not always written in past tense[*].

[*]: https://lore.kernel.org/git/20240112171910.11131-1-ericsunshine@charter.net/
Kristoffer Haugsbakk March 7, 2024, 9:30 p.m. UTC | #8
On Thu, Mar 7, 2024 at 4:24 PM Junio C Hamano <gitster@pobox.com> wrote:
>> "Kristoffer Haugsbakk" <code@khaugsbakk.name> writes:
>> > On Thu, Mar 7, 2024, at 21:47, Junio C Hamano wrote:
>> >> [jc: tweaked the commit log message and the implementation a bit]
>> >
>> > Just a question. Given the imperative mood principle/rule, why are these
>> > bracket changelog lines always written in the past tense?
>>
>> These are not giving orders to the code to become like so.  The
>> trailer block records what happend to the patch in chronological
>> order---think of those written there at one level higher level,
>> "meta" comments.

And when I think about it the trailers themselves are of course in the
past tense…

Thanks guys

On Thu, Mar 7, 2024, at 22:26, Eric Sunshine wrote:
>
> Also, they are not always written in past tense[*].
>
> [*]:
> https://lore.kernel.org/git/20240112171910.11131-1-ericsunshine@charter.net/
Florian Schmidt March 8, 2024, 9:08 a.m. UTC | #9
That's a quick review cycle, thanks!

On 07/03/2024 19:20, Junio C Hamano wrote:
> You do not have to say "Currently, if"; just "If" is sufficient.
> Cf. Documentation/SubmittingPatches[[present-tense]]

ack

> Looks correct, but we probably can make the fix a lot more isolated
> into a single block, like the attached patch.  How does this look?
> 
>   wt-status.c | 7 +++++--
>   1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git c/wt-status.c w/wt-status.c
> index b5a29083df..511f37cfe0 100644
> --- c/wt-status.c
> +++ w/wt-status.c
> @@ -1093,8 +1093,11 @@ size_t wt_status_locate_end(const char *s, size_t len)
>   	strbuf_addf(&pattern, "\n%c %s", comment_line_char, cut_line);
>   	if (starts_with(s, pattern.buf + 1))
>   		len = 0;
> -	else if ((p = strstr(s, pattern.buf)))
> -		len = p - s + 1;
> +	else if ((p = strstr(s, pattern.buf))) {
> +		int newlen = p - s + 1;
> +		if (newlen < len)
> +			len = newlen;
> +	}
>   	strbuf_release(&pattern);
>   	return len;
>   }

That looks good to me, thanks! For context, I had sent in the slightly 
larger patch because at first look, I got confused by the fact that 
wt_status_locate_end takes len as a parameter, but then seemingly 
doesn't read it at all and only writes to it. (Of course, if neither the 
if nor the if-else branch are hit, len is in fact read when it is 
returned unchanged.) So I made the patch a bit larger in the hope that 
the next cursory reader might not get confused.
But this option is a much more minimal patch, and functionally the same. 
So it comes down to style, and you have a much better feeling for that 
in this code base, so I'm happy to go with this.

Do you want me to send this version as a v2 to you + the list as per the 
documentation?

Cheers,
flosch
Florian Schmidt March 8, 2024, 9:13 a.m. UTC | #10
On 07/03/2024 19:35, Junio C Hamano wrote:
> Florian Schmidt <flosch@nutanix.com> writes:
> 
>> Side remark: Since strstr() doesn't consider len, and will always search
>> up to a null byte, I now wonder whether it would be safer to create a
>> new strbuf that only contains the len bytes we want to operate on.
> 
> That is a valid concern in general, but does not seem to apply to
> the current codebase.  Thanks for being careful.

Thanks, that confirms my cursory look at the consumers of the function. 
If you think that it's unlikely that in the future, a new user of this 
function would provide a non-terminated string, then there is no need 
for action. I guess the aim is to use strbufs wherever suitable in the 
first place, anyway, and those won't have this issue?

Cheers,
flosch
Junio C Hamano March 8, 2024, 3:26 p.m. UTC | #11
Florian Schmidt <flosch@nutanix.com> writes:

> Do you want me to send this version as a v2 to you + the list as per
> the documentation?

It would be the technically correct way to do so, but as a short-cut
to reduce a round-trip, if you are happy with the version I queued
(should be found by fetching the 'seen' branch from any of the
mirrors), you can just say "that looks fine" and we can be done with
this patch.

Thanks.
Florian Schmidt March 8, 2024, 5:43 p.m. UTC | #12
On 08/03/2024 15:26, Junio C Hamano wrote:
> It would be the technically correct way to do so, but as a short-cut
> to reduce a round-trip, if you are happy with the version I queued
> (should be found by fetching the 'seen' branch from any of the
> mirrors), you can just say "that looks fine" and we can be done with
> this patch.

I had a look, and it looks fine (though there's the one typo already 
pointed out by Eric: s/exeption/exception/). You can go ahead and use 
that version of the patch.

Cheers,
flosch
Linus Arver April 6, 2024, 1:37 a.m. UTC | #13
Junio C Hamano <gitster@pobox.com> writes:

> Florian Schmidt <flosch@nutanix.com> writes:
>
>> It seems sensible to expect that wt_status_locate_end() should honour
>> the length parameter passed in, and doing so fixes this issue.
>
> Thanks.  This is an ancient bug, not a retression from recent
> changes to the trailer library [linusa CC'ed to save him from
> wasting his time wondering if he broke anything].

Ack. Very much appreciated!

Also, sorry for not responding sooner --- my email search queries so far
were using "replied" as a condition so I only saw this just now while
searching for the "trailer" term.

Cheers
diff mbox series

Patch

diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh
index ec9c6de114..3d3e13ccf8 100755
--- a/t/t7513-interpret-trailers.sh
+++ b/t/t7513-interpret-trailers.sh
@@ -1935,4 +1935,18 @@  test_expect_success 'suppressing --- does not disable cut-line handling' '
 	test_cmp expected actual
 '
 
+test_expect_success 'handling of --- lines in conjunction with cut-lines' '
+	echo "my-trailer: here" >expected &&
+
+	git interpret-trailers --parse >actual <<-\EOF &&
+	subject
+
+	my-trailer: here
+	---
+	# ------------------------ >8 ------------------------
+	EOF
+
+	test_cmp expected actual
+'
+
 test_done
diff --git a/wt-status.c b/wt-status.c
index b5a29083df..51a84575ed 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1089,14 +1089,19 @@  size_t wt_status_locate_end(const char *s, size_t len)
 {
 	const char *p;
 	struct strbuf pattern = STRBUF_INIT;
+	size_t result = len;
 
 	strbuf_addf(&pattern, "\n%c %s", comment_line_char, cut_line);
 	if (starts_with(s, pattern.buf + 1))
-		len = 0;
-	else if ((p = strstr(s, pattern.buf)))
-		len = p - s + 1;
+		result = 0;
+	else if ((p = strstr(s, pattern.buf))) {
+		result = p - s + 1;
+		if (result > len) {
+			result = len;
+		}
+	}
 	strbuf_release(&pattern);
-	return len;
+	return result;
 }
 
 void wt_status_append_cut_line(struct strbuf *buf)