Message ID | 20250225062824.GA1293961@coredump.intra.peff.net (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | some zlib inflating bug fixes | expand |
diff --git a/object-file.c b/object-file.c index 00c3a4b910..45b251ba04 100644 --- a/object-file.c +++ b/object-file.c @@ -1580,6 +1580,8 @@ static int loose_object_info(struct repository *r, if (!oi->contentp) break; + if (hdrbuf.len) + BUG("unpacking content with unknown types not yet supported"); *oi->contentp = unpack_loose_rest(&stream, hdr, *oi->sizep, oid); if (*oi->contentp) goto cleanup;
After unpack_loose_header() returns, it will have inflated not only the object header, but possibly some bytes of the object content. When we call unpack_loose_rest() to extract the actual content, it finds those extra bytes by skipping past the header's terminating NUL in the buffer. Like this: int bytes = strlen(buffer) + 1; n = stream->total_out - bytes; ... memcpy(buf, (char *) buffer + bytes, n); This won't work with the OBJECT_INFO_ALLOW_UNKNOWN_TYPE flag, as there we allow a header of arbitrary size. We put into a strbuf, but feed only the final 32-byte chunk we read to unpack_loose_rest(). In that case stream->total_out may unexpectedly large, and thus our "n" will be large, causing an out-of-bounds read (we do check it against our allocated buffer size, which prevents an out-of-bounds write). Probably this could be made to work by feeding the strbuf to unpack_loose_rest(), along with adjusting some types (e.g., "bytes" would need to be a size_t, since it is no longer operating on a 32-byte buffer). But I don't think it's possible to actually trigger this in practice. The only caller who passes ALLOW_UNKNOWN_TYPE is cat-file, which only allows it with the "-t" and "-s" options (neither of which access the content). There is one way you can _almost_ trigger it: the oid compat routines (i.e., accessing sha1 via sha256 names and vice versa) will convert objects on the fly (which requires access to the content) using the same flags that were passed in. So in theory this: t='some very large type field that causes an extra inflate call' sha1_oid=$(git hash-object -w -t "$t" file) sha256_oid=$(git rev-parse --output-object-format=sha256 $sha1_oid) git cat-file --allow-unknown-type -s $sha256_oid would try to access the content. But it doesn't work, because using compat objects requires an entry in the .git/objects/loose-object-idx file, and we don't generate such an entry for non-standard types (see the "compat" section of write_object_file_literally()). If we use "t=blob" instead, then it does access the compat object, but it doesn't trigger the problem (because "blob" is a standard short type name, and it fits in the initial 32-byte buffer). So given that this is almost a memory error bug, I think it's worth addressing. But because we can't actually trigger the situation, I'm hesitant to try a fix that we can't run. Instead let's document the restriction and protect ourselves from the out-of-bounds read by adding a BUG() check. Signed-off-by: Jeff King <peff@peff.net> --- I found this because I was tracing the code path after unpack_loose_header() returns to verify some assumptions in the other patches. It really makes me wonder if this "unknown type" stuff has any value at all. You can create an object with any type using "hash-object --literally -t". And you can ask about its type and size. But you can never retrieve the object content! Nor can you pack it or transfer it, since packs use a numeric type field. This code was added ~2015, but I don't think anybody built more on top of it. I wonder if we should just consider it a failed experiment and rip out the support. object-file.c | 2 ++ 1 file changed, 2 insertions(+)