diff mbox series

pkt-line: don't check string length in packet_length()

Message ID 89d58db7-6a01-b3fa-54f0-19d5a3819eb3@web.de (mailing list archive)
State New, archived
Headers show
Series pkt-line: don't check string length in packet_length() | expand

Commit Message

René Scharfe July 1, 2023, 7:05 a.m. UTC
hex2chr() takes care not to run over the end of a short string.
101736a14c (pkt-line: extern packet_length(), 2020-05-19) turned the
input parameter of packet_length() from a string pointer into an array
of known length, making string length checks unnecessary.  Get rid of
them by using hexval() directly.

The resulting branchless code is simpler and it becomes easier to see
that the function mirrors set_packet_header().

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 pkt-line.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

--
2.41.0

Comments

Junio C Hamano July 5, 2023, 5:56 a.m. UTC | #1
René Scharfe <l.s.r@web.de> writes:

> hex2chr() takes care not to run over the end of a short string.
> 101736a14c (pkt-line: extern packet_length(), 2020-05-19) turned the
> input parameter of packet_length() from a string pointer into an array
> of known length, making string length checks unnecessary.  Get rid of
> them by using hexval() directly.

I am puzzled about the part of the above description on "making
string length checks unnecessary".  The two callers we currently
have both do pass char[4], but the compiler would not stop us from
passing a pointer to a memory region of an unknown size; if we
butcher one of the current callers

diff --git c/pkt-line.c w/pkt-line.c
index 6e022029ca..e1c49baefd 100644
--- c/pkt-line.c
+++ w/pkt-line.c
@@ -421,7 +421,7 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer,
 		return PACKET_READ_EOF;
 	}
 
-	len = packet_length(linelen);
+	len = packet_length(buffer);
 
 	if (len < 0) {
 		if (options & PACKET_READ_GENTLE_ON_READ_ERROR)

where "buffer" is just a random piece of memory passed to the caller
and there is no such guarantee like "it at least is 4 bytes long",
we would just slurp garbage and run past the end of the buffer.

> The resulting branchless code is simpler and it becomes easier to see
> that the function mirrors set_packet_header().

I do like the resulting code, but I feel a bit uneasy to sell this
change as "the code becomes more streamlined without losing safety".
It looks more like "this change is safe for our two callers; those
adding more callers in the future are better be very careful", no?
René Scharfe July 5, 2023, 4:15 p.m. UTC | #2
Am 05.07.23 um 07:56 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>> hex2chr() takes care not to run over the end of a short string.
>> 101736a14c (pkt-line: extern packet_length(), 2020-05-19) turned the
>> input parameter of packet_length() from a string pointer into an array
>> of known length, making string length checks unnecessary.  Get rid of
>> them by using hexval() directly.
>
> I am puzzled about the part of the above description on "making
> string length checks unnecessary".  The two callers we currently
> have both do pass char[4], but the compiler would not stop us from
> passing a pointer to a memory region of an unknown size; if we
> butcher one of the current callers
>
> diff --git c/pkt-line.c w/pkt-line.c
> index 6e022029ca..e1c49baefd 100644
> --- c/pkt-line.c
> +++ w/pkt-line.c
> @@ -421,7 +421,7 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer,
>  		return PACKET_READ_EOF;
>  	}
>
> -	len = packet_length(linelen);
> +	len = packet_length(buffer);
>
>  	if (len < 0) {
>  		if (options & PACKET_READ_GENTLE_ON_READ_ERROR)
>
> where "buffer" is just a random piece of memory passed to the caller
> and there is no such guarantee like "it at least is 4 bytes long",
> we would just slurp garbage and run past the end of the buffer.

Indeed, GCC warns if you give it a short array, but not if you pass a
pointer: https://godbolt.org/z/T3xfE5W9n.  Clang doesn't care at all.

So that was wishful thinking on my part.  Sorry.  :-/

>> The resulting branchless code is simpler and it becomes easier to see
>> that the function mirrors set_packet_header().
>
> I do like the resulting code, but I feel a bit uneasy to sell this
> change as "the code becomes more streamlined without losing safety".
> It looks more like "this change is safe for our two callers; those
> adding more callers in the future are better be very careful", no?

With no way to enforce passing an array of a certain size to a function
the only safe options I see are keeping the length check, using a macro
or inlining the calculation.  Hmm.

René
Junio C Hamano July 5, 2023, 8:33 p.m. UTC | #3
René Scharfe <l.s.r@web.de> writes:

>> I do like the resulting code, but I feel a bit uneasy to sell this
>> change as "the code becomes more streamlined without losing safety".
>> It looks more like "this change is safe for our two callers; those
>> adding more callers in the future are better be very careful", no?
>
> With no way to enforce passing an array of a certain size to a function
> the only safe options I see are keeping the length check, using a macro
> or inlining the calculation.  Hmm.

We keep repeating "length check" because that is what the comment in
the function says, but even if the caller has 4-byte, that 4-byte
substring at the beginning is what it read from the untrusted side
over the network.  We should be checking if we have 4 hexadecimal
length even if we are not running beyond the end of the buffer, no?

So it may be that the comment needs to be fixed more than the code.

Thanks.
René Scharfe July 5, 2023, 9:11 p.m. UTC | #4
Am 05.07.23 um 22:33 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>>> I do like the resulting code, but I feel a bit uneasy to sell this
>>> change as "the code becomes more streamlined without losing safety".
>>> It looks more like "this change is safe for our two callers; those
>>> adding more callers in the future are better be very careful", no?
>>
>> With no way to enforce passing an array of a certain size to a function
>> the only safe options I see are keeping the length check, using a macro
>> or inlining the calculation.  Hmm.

Three more ideas: Wrap the buffer in a custom struct, pass the four
bytes as a uint32_t or as four separate char parameters.  I better
stop now..

> We keep repeating "length check" because that is what the comment in
> the function says, but even if the caller has 4-byte, that 4-byte
> substring at the beginning is what it read from the untrusted side
> over the network.  We should be checking if we have 4 hexadecimal
> length even if we are not running beyond the end of the buffer, no?

Sure, that's done.  If any of the four characters is not a hexadecimal
digit then packet_length() returns a negative value, before and after
the change.

> So it may be that the comment needs to be fixed more than the code.

Which comment specifically?  The one in pkt-line.h doesn't mention
what happens if you pass in a short buffer -- leaving it undefined is
OK, I think.  And in and around pkt-line.c::packet_length() I don't
see any comment?

René
Junio C Hamano July 5, 2023, 10:27 p.m. UTC | #5
René Scharfe <l.s.r@web.de> writes:

> Sure, that's done.  If any of the four characters is not a hexadecimal
> digit then packet_length() returns a negative value, before and after
> the change.

Ah, it is the beauty of hexval[] table ;-)

So as long as we are sure that we are not running beyond the end of
the buffer, we are OK.  And as I already said, I think "this change
is safe for our two callers; those adding more callers in the future
are better be very careful" is probably good enough for this one.

hex.h:hex2chr() says "don't run over the end of short strings", but
as far as I can see it does not check any such thing; find a page of
memory, whose next page is unmapped, and pointing *s at the last
byte of that page and calling it will happily run over the end and
would cause SIGBUS.  The function assumes that such a short string
is always NUL terminated, which is not a great way to guarantee that
we do not run over the end of strings.
René Scharfe July 6, 2023, 5:07 a.m. UTC | #6
Am 06.07.23 um 00:27 schrieb Junio C Hamano:
> hex.h:hex2chr() says "don't run over the end of short strings", but
> as far as I can see it does not check any such thing; find a page of
> memory, whose next page is unmapped, and pointing *s at the last
> byte of that page and calling it will happily run over the end and
> would cause SIGBUS.  The function assumes that such a short string
> is always NUL terminated, which is not a great way to guarantee that
> we do not run over the end of strings.

Yes, hex2chr() works with C strings, i.e. those that end with a NUL
character.  An empty string is just a NUL byte, a string of length 1
is a non-NUL byte and a NUL.  The function reads one byte from the
former and otherwise two bytes -- no overrun.

If a C string loses its NUL, how could you detect its end?

René
diff mbox series

Patch

diff --git a/pkt-line.c b/pkt-line.c
index 62b4208b66..6e022029ca 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -375,8 +375,10 @@  static int get_packet_data(int fd, char **src_buf, size_t *src_size,

 int packet_length(const char lenbuf_hex[4])
 {
-	int val = hex2chr(lenbuf_hex);
-	return (val < 0) ? val : (val << 8) | hex2chr(lenbuf_hex + 2);
+	return	hexval(lenbuf_hex[0]) << 12 |
+		hexval(lenbuf_hex[1]) <<  8 |
+		hexval(lenbuf_hex[2]) <<  4 |
+		hexval(lenbuf_hex[3]);
 }

 static char *find_packfile_uri_path(const char *buffer)