diff mbox series

[01/10] loose_object_info(): BUG() on inflating content with unknown type

Message ID 20250225062824.GA1293961@coredump.intra.peff.net (mailing list archive)
State New
Headers show
Series some zlib inflating bug fixes | expand

Commit Message

Jeff King Feb. 25, 2025, 6:28 a.m. UTC
After unpack_loose_header() returns, it will have inflated not only the
object header, but possibly some bytes of the object content. When we
call unpack_loose_rest() to extract the actual content, it finds those
extra bytes by skipping past the header's terminating NUL in the buffer.
Like this:

  int bytes = strlen(buffer) + 1;
  n = stream->total_out - bytes;
  ...
  memcpy(buf, (char *) buffer + bytes, n);

This won't work with the OBJECT_INFO_ALLOW_UNKNOWN_TYPE flag, as there
we allow a header of arbitrary size. We put into a strbuf, but feed only
the final 32-byte chunk we read to unpack_loose_rest(). In that case
stream->total_out may unexpectedly large, and thus our "n" will be
large, causing an out-of-bounds read (we do check it against our
allocated buffer size, which prevents an out-of-bounds write).

Probably this could be made to work by feeding the strbuf to
unpack_loose_rest(), along with adjusting some types (e.g., "bytes"
would need to be a size_t, since it is no longer operating on a 32-byte
buffer).

But I don't think it's possible to actually trigger this in practice.
The only caller who passes ALLOW_UNKNOWN_TYPE is cat-file, which only
allows it with the "-t" and "-s" options (neither of which access the
content). There is one way you can _almost_ trigger it: the oid compat
routines (i.e., accessing sha1 via sha256 names and vice versa) will
convert objects on the fly (which requires access to the content) using
the same flags that were passed in. So in theory this:

  t='some very large type field that causes an extra inflate call'
  sha1_oid=$(git hash-object -w -t "$t" file)
  sha256_oid=$(git rev-parse --output-object-format=sha256 $sha1_oid)
  git cat-file --allow-unknown-type -s $sha256_oid

would try to access the content. But it doesn't work, because using
compat objects requires an entry in the .git/objects/loose-object-idx
file, and we don't generate such an entry for non-standard types (see
the "compat" section of write_object_file_literally()).

If we use "t=blob" instead, then it does access the compat object, but
it doesn't trigger the problem (because "blob" is a standard short type
name, and it fits in the initial 32-byte buffer).

So given that this is almost a memory error bug, I think it's worth
addressing. But because we can't actually trigger the situation, I'm
hesitant to try a fix that we can't run. Instead let's document the
restriction and protect ourselves from the out-of-bounds read by adding
a BUG() check.

Signed-off-by: Jeff King <peff@peff.net>
---
I found this because I was tracing the code path after
unpack_loose_header() returns to verify some assumptions in the other
patches.

It really makes me wonder if this "unknown type" stuff has any value
at all. You can create an object with any type using "hash-object
--literally -t". And you can ask about its type and size. But you can
never retrieve the object content! Nor can you pack it or transfer it,
since packs use a numeric type field.

This code was added ~2015, but I don't think anybody built more on top
of it. I wonder if we should just consider it a failed experiment and
rip out the support.

 object-file.c | 2 ++
 1 file changed, 2 insertions(+)
diff mbox series

Patch

diff --git a/object-file.c b/object-file.c
index 00c3a4b910..45b251ba04 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1580,6 +1580,8 @@  static int loose_object_info(struct repository *r,
 
 		if (!oi->contentp)
 			break;
+		if (hdrbuf.len)
+			BUG("unpacking content with unknown types not yet supported");
 		*oi->contentp = unpack_loose_rest(&stream, hdr, *oi->sizep, oid);
 		if (*oi->contentp)
 			goto cleanup;