From patchwork Fri Nov 13 05:06:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 11902515 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5DA9C4742C for ; Fri, 13 Nov 2020 05:06:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7168820B80 for ; Fri, 13 Nov 2020 05:06:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726174AbgKMFGu (ORCPT ); Fri, 13 Nov 2020 00:06:50 -0500 Received: from cloud.peff.net ([104.130.231.41]:56906 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726102AbgKMFGu (ORCPT ); Fri, 13 Nov 2020 00:06:50 -0500 Received: (qmail 23756 invoked by uid 109); 13 Nov 2020 05:06:50 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Fri, 13 Nov 2020 05:06:50 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 6163 invoked by uid 111); 13 Nov 2020 05:06:49 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 13 Nov 2020 00:06:49 -0500 Authentication-Results: peff.net; auth=none Date: Fri, 13 Nov 2020 00:06:48 -0500 From: Jeff King To: git@vger.kernel.org Subject: [PATCH 1/5] compute pack .idx byte offsets using size_t Message-ID: <20201113050648.GA744691@coredump.intra.peff.net> References: <20201113050631.GA744608@coredump.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20201113050631.GA744608@coredump.intra.peff.net> Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org A pack and its matching .idx file are limited to 2^32 objects, because the pack format contains a 32-bit field to store the number of objects. Hence we use uint32_t in the code. But the byte count of even a .idx file can be much larger than that, because it stores at least a hash and an offset for each object. So using SHA-1, a v2 .idx file will cross the 4GB boundary at 153,391,650 objects. This confuses load_idx(), which computes the minimum size like this: unsigned long min_size = 8 + 4*256 + nr*(hashsz + 4 + 4) + hashsz + hashsz; Even though min_size will be big enough on most 64-bit platforms, the actual arithmetic is done as a uint32_t, resulting in a truncation. We actually exceed that min_size, but then we do: unsigned long max_size = min_size; if (nr) max_size += (nr - 1)*8; to account for the variable-sized table. That computation doesn't overflow quite so low, but with the truncation for min_size, we end up with a max_size that is much smaller than our actual size. So we complain that the idx is invalid, and can't find any of its objects. We can fix this case by casting "nr" to a size_t, which will do the multiplication in 64-bits (assuming you're on a 64-bit platform; this will never work on a 32-bit system since we couldn't map the whole .idx anyway). Likewise, we don't have to worry about further additions, because adding a smaller number to a size_t will convert the other side to a size_t. A few notes: - obviously we could just declare "nr" as a size_t in the first place (and likewise, packed_git.num_objects). But it's conceptually a uint32_t because of the on-disk format, and we correctly treat it that way in other contexts that don't need to compute byte offsets (e.g., iterating over the set of objects should and generally does use a uint32_t). Switching to size_t would make all of those other cases look wrong. - it could be argued that the proper type is off_t to represent the file offset. But in practice the .idx file must fit within memory, because we mmap the whole thing. And the rest of the code (including the idx_size variable we're comparing against) uses size_t. - we'll add the same cast to the max_size arithmetic line. Even though we're adding to a larger type, which will convert our result, the multiplication is still done as a 32-bit value and can itself overflow. I didn't check this with my test case, since it would need an even larger pack (~530M objects), but looking at compiler output shows that it works this way. The standard should agree, but I couldn't find anything explicit in 6.3.1.8 ("usual arithmetic conversions"). The case in load_idx() was the most immediate one that I was able to trigger. After fixing it, looking up actual objects (including the very last one in sha1 order) works in a test repo with 153,725,110 objects. That's because bsearch_hash() works with uint32_t entry indices, and the actual byte access: int cmp = hashcmp(table + mi * stride, sha1); is done with "stride" as a size_t, causing the uint32_t "mi" to be promoted to a size_t. This is the way most code will access the index data. However, I audited all of the other byte-wise accesses of packed_git.index_data, and many of the others are suspect (they are similar to the max_size one, where we are adding to a properly sized offset or directly to a pointer, but the multiplication in the sub-expression can overflow). I didn't trigger any of these in practice, but I believe they're potential problems, and certainly adding in the cast is not going to hurt anything here. Signed-off-by: Jeff King --- builtin/index-pack.c | 2 +- pack-check.c | 2 +- pack-revindex.c | 2 +- packfile.c | 12 ++++++------ 4 files changed, 9 insertions(+), 9 deletions(-) diff --git a/builtin/index-pack.c b/builtin/index-pack.c index 0d03cb442d..4b8d86e0ad 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -1597,7 +1597,7 @@ static void read_v2_anomalous_offsets(struct packed_git *p, /* The address of the 4-byte offset table */ idx1 = (((const uint32_t *)((const uint8_t *)p->index_data + p->crc_offset)) - + p->num_objects /* CRC32 table */ + + (size_t)p->num_objects /* CRC32 table */ ); /* The address of the 8-byte offset table */ diff --git a/pack-check.c b/pack-check.c index dad6d8ae7f..db3adf8781 100644 --- a/pack-check.c +++ b/pack-check.c @@ -39,7 +39,7 @@ int check_pack_crc(struct packed_git *p, struct pack_window **w_curs, } while (len); index_crc = p->index_data; - index_crc += 2 + 256 + p->num_objects * (the_hash_algo->rawsz/4) + nr; + index_crc += 2 + 256 + (size_t)p->num_objects * (the_hash_algo->rawsz/4) + nr; return data_crc != ntohl(*index_crc); } diff --git a/pack-revindex.c b/pack-revindex.c index d28a7e43d0..ecdde39cf4 100644 --- a/pack-revindex.c +++ b/pack-revindex.c @@ -130,7 +130,7 @@ static void create_pack_revindex(struct packed_git *p) if (p->index_version > 1) { const uint32_t *off_32 = - (uint32_t *)(index + 8 + p->num_objects * (hashsz + 4)); + (uint32_t *)(index + 8 + (size_t)p->num_objects * (hashsz + 4)); const uint32_t *off_64 = off_32 + p->num_objects; for (i = 0; i < num_ent; i++) { const uint32_t off = ntohl(*off_32++); diff --git a/packfile.c b/packfile.c index 0929ebe4fc..a72c2a261f 100644 --- a/packfile.c +++ b/packfile.c @@ -148,7 +148,7 @@ int load_idx(const char *path, const unsigned int hashsz, void *idx_map, * - hash of the packfile * - file checksum */ - if (idx_size != 4 * 256 + nr * (hashsz + 4) + hashsz + hashsz) + if (idx_size != 4 * 256 + (size_t)nr * (hashsz + 4) + hashsz + hashsz) return error("wrong index v1 file size in %s", path); } else if (version == 2) { /* @@ -164,10 +164,10 @@ int load_idx(const char *path, const unsigned int hashsz, void *idx_map, * variable sized table containing 8-byte entries * for offsets larger than 2^31. */ - unsigned long min_size = 8 + 4*256 + nr*(hashsz + 4 + 4) + hashsz + hashsz; + unsigned long min_size = 8 + 4*256 + (size_t)nr*(hashsz + 4 + 4) + hashsz + hashsz; unsigned long max_size = min_size; if (nr) - max_size += (nr - 1)*8; + max_size += ((size_t)nr - 1)*8; if (idx_size < min_size || idx_size > max_size) return error("wrong index v2 file size in %s", path); if (idx_size != min_size && @@ -1933,14 +1933,14 @@ off_t nth_packed_object_offset(const struct packed_git *p, uint32_t n) const unsigned int hashsz = the_hash_algo->rawsz; index += 4 * 256; if (p->index_version == 1) { - return ntohl(*((uint32_t *)(index + (hashsz + 4) * n))); + return ntohl(*((uint32_t *)(index + (hashsz + 4) * (size_t)n))); } else { uint32_t off; - index += 8 + p->num_objects * (hashsz + 4); + index += 8 + (size_t)p->num_objects * (hashsz + 4); off = ntohl(*((uint32_t *)(index + 4 * n))); if (!(off & 0x80000000)) return off; - index += p->num_objects * 4 + (off & 0x7fffffff) * 8; + index += (size_t)p->num_objects * 4 + (off & 0x7fffffff) * 8; check_pack_index_ptr(p, index); return get_be64(index); } From patchwork Fri Nov 13 05:07:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 11902509 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 18699C388F9 for ; Fri, 13 Nov 2020 05:07:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id ABEC720A8B for ; Fri, 13 Nov 2020 05:07:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726189AbgKMFHC (ORCPT ); Fri, 13 Nov 2020 00:07:02 -0500 Received: from cloud.peff.net ([104.130.231.41]:56912 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726054AbgKMFHC (ORCPT ); Fri, 13 Nov 2020 00:07:02 -0500 Received: (qmail 23764 invoked by uid 109); 13 Nov 2020 05:07:02 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Fri, 13 Nov 2020 05:07:02 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 6189 invoked by uid 111); 13 Nov 2020 05:07:01 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 13 Nov 2020 00:07:01 -0500 Authentication-Results: peff.net; auth=none Date: Fri, 13 Nov 2020 00:07:01 -0500 From: Jeff King To: git@vger.kernel.org Subject: [PATCH 2/5] use size_t to store pack .idx byte offsets Message-ID: <20201113050701.GB744691@coredump.intra.peff.net> References: <20201113050631.GA744608@coredump.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20201113050631.GA744608@coredump.intra.peff.net> Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org We sometimes store the offset into a pack .idx file as an "unsigned long", but the mmap'd size of a pack .idx file can exceed 4GB. This is sufficient on LP64 systems like Linux, but will be too small on LLP64 systems like Windows, where "unsigned long" is still only 32 bits. Let's use size_t, which is a better type for an offset into a memory buffer. Signed-off-by: Jeff King --- builtin/pack-redundant.c | 6 +++--- packfile.c | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/builtin/pack-redundant.c b/builtin/pack-redundant.c index 178e3409b7..3e70f2a4c1 100644 --- a/builtin/pack-redundant.c +++ b/builtin/pack-redundant.c @@ -236,7 +236,7 @@ static struct pack_list * pack_list_difference(const struct pack_list *A, static void cmp_two_packs(struct pack_list *p1, struct pack_list *p2) { - unsigned long p1_off = 0, p2_off = 0, p1_step, p2_step; + size_t p1_off = 0, p2_off = 0, p1_step, p2_step; const unsigned char *p1_base, *p2_base; struct llist_item *p1_hint = NULL, *p2_hint = NULL; const unsigned int hashsz = the_hash_algo->rawsz; @@ -280,7 +280,7 @@ static void cmp_two_packs(struct pack_list *p1, struct pack_list *p2) static size_t sizeof_union(struct packed_git *p1, struct packed_git *p2) { size_t ret = 0; - unsigned long p1_off = 0, p2_off = 0, p1_step, p2_step; + size_t p1_off = 0, p2_off = 0, p1_step, p2_step; const unsigned char *p1_base, *p2_base; const unsigned int hashsz = the_hash_algo->rawsz; @@ -499,7 +499,7 @@ static void scan_alt_odb_packs(void) static struct pack_list * add_pack(struct packed_git *p) { struct pack_list l; - unsigned long off = 0, step; + size_t off = 0, step; const unsigned char *base; if (!p->pack_local && !(alt_odb || verbose)) diff --git a/packfile.c b/packfile.c index a72c2a261f..63fe9ee8be 100644 --- a/packfile.c +++ b/packfile.c @@ -164,8 +164,8 @@ int load_idx(const char *path, const unsigned int hashsz, void *idx_map, * variable sized table containing 8-byte entries * for offsets larger than 2^31. */ - unsigned long min_size = 8 + 4*256 + (size_t)nr*(hashsz + 4 + 4) + hashsz + hashsz; - unsigned long max_size = min_size; + size_t min_size = 8 + 4*256 + (size_t)nr*(hashsz + 4 + 4) + hashsz + hashsz; + size_t max_size = min_size; if (nr) max_size += ((size_t)nr - 1)*8; if (idx_size < min_size || idx_size > max_size) From patchwork Fri Nov 13 05:07:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 11902513 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A7FBC388F9 for ; Fri, 13 Nov 2020 05:07:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4CC5520B80 for ; Fri, 13 Nov 2020 05:07:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726198AbgKMFHP (ORCPT ); Fri, 13 Nov 2020 00:07:15 -0500 Received: from cloud.peff.net ([104.130.231.41]:56918 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726054AbgKMFHP (ORCPT ); Fri, 13 Nov 2020 00:07:15 -0500 Received: (qmail 23770 invoked by uid 109); 13 Nov 2020 05:07:15 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Fri, 13 Nov 2020 05:07:15 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 6208 invoked by uid 111); 13 Nov 2020 05:07:14 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 13 Nov 2020 00:07:14 -0500 Authentication-Results: peff.net; auth=none Date: Fri, 13 Nov 2020 00:07:14 -0500 From: Jeff King To: git@vger.kernel.org Subject: [PATCH 3/5] fsck: correctly compute checksums on idx files larger than 4GB Message-ID: <20201113050714.GC744691@coredump.intra.peff.net> References: <20201113050631.GA744608@coredump.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20201113050631.GA744608@coredump.intra.peff.net> Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org When checking the trailing checksum hash of a .idx file, we pass the whole buffer (minus the trailing hash) into a single call to the_hash_algo->update_fn(). But we cast it to an "unsigned int". This comes from c4001d92be (Use off_t when we really mean a file offset., 2007-03-06). That commit started storing the index_size variable as an off_t, but our mozilla-sha1 implementation from the time was limited to a smaller size. Presumably the cast was a way of annotating that we expected .idx files to be small, and so we didn't need to loop (as we do for arbitrarily-large .pack files). Though as an aside it was still wrong, because the mozilla function actually took a signed int. These days our hash-update functions are defined to take a size_t, so we can pass the whole buffer in directly. The cast is actually causing a buggy truncation! While we're here, though, let's drop the confusing off_t variable in the first place. We're getting the size not from the filesystem anyway, but from p->index_size, which is a size_t. In fact, we can make the code a bit more readable by dropping our local variable duplicating p->index_size, and instead have one that stores the size of the actual index data, minus the trailing hash. Signed-off-by: Jeff King --- pack-check.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/pack-check.c b/pack-check.c index db3adf8781..4b089fe8ec 100644 --- a/pack-check.c +++ b/pack-check.c @@ -164,22 +164,22 @@ static int verify_packfile(struct repository *r, int verify_pack_index(struct packed_git *p) { - off_t index_size; + size_t len; const unsigned char *index_base; git_hash_ctx ctx; unsigned char hash[GIT_MAX_RAWSZ]; int err = 0; if (open_pack_index(p)) return error("packfile %s index not opened", p->pack_name); - index_size = p->index_size; index_base = p->index_data; + len = p->index_size - the_hash_algo->rawsz; /* Verify SHA1 sum of the index file */ the_hash_algo->init_fn(&ctx); - the_hash_algo->update_fn(&ctx, index_base, (unsigned int)(index_size - the_hash_algo->rawsz)); + the_hash_algo->update_fn(&ctx, index_base, len); the_hash_algo->final_fn(hash, &ctx); - if (!hasheq(hash, index_base + index_size - the_hash_algo->rawsz)) + if (!hasheq(hash, index_base + len)) err = error("Packfile index for %s hash mismatch", p->pack_name); return err; From patchwork Fri Nov 13 05:07:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 11902511 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6AB6FC388F9 for ; Fri, 13 Nov 2020 05:07:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1C09F20B80 for ; Fri, 13 Nov 2020 05:07:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726204AbgKMFHS (ORCPT ); Fri, 13 Nov 2020 00:07:18 -0500 Received: from cloud.peff.net ([104.130.231.41]:56920 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726054AbgKMFHS (ORCPT ); Fri, 13 Nov 2020 00:07:18 -0500 Received: (qmail 23773 invoked by uid 109); 13 Nov 2020 05:07:18 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Fri, 13 Nov 2020 05:07:18 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 6211 invoked by uid 111); 13 Nov 2020 05:07:17 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 13 Nov 2020 00:07:17 -0500 Authentication-Results: peff.net; auth=none Date: Fri, 13 Nov 2020 00:07:17 -0500 From: Jeff King To: git@vger.kernel.org Subject: [PATCH 4/5] block-sha1: take a size_t length parameter Message-ID: <20201113050717.GD744691@coredump.intra.peff.net> References: <20201113050631.GA744608@coredump.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20201113050631.GA744608@coredump.intra.peff.net> Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The block-sha1 implementation takes an "unsigned long" for the length of a buffer to hash, but our hash algorithm wrappers take a size_t, as do other implementations we support like openssl or sha1dc. On many systems, including Linux, these two are equivalent, but they are not on Windows (where only a "long long" is 64 bits). As a result, passing large chunks to a single the_hash_algo->update_fn() would produce wrong answers there. Note that we don't need to update any other sizes outside of the function interface. We store the cumulative size in a "long long" (which we must do since we hash things bigger than 4GB, like packfiles, even on 32-bit platforms). And internally, we break that size_t len down into 64-byte blocks to feed into the guts of the algorithm. Signed-off-by: Jeff King --- block-sha1/sha1.c | 2 +- block-sha1/sha1.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/block-sha1/sha1.c b/block-sha1/sha1.c index 22b125cf8c..8681031402 100644 --- a/block-sha1/sha1.c +++ b/block-sha1/sha1.c @@ -203,7 +203,7 @@ void blk_SHA1_Init(blk_SHA_CTX *ctx) ctx->H[4] = 0xc3d2e1f0; } -void blk_SHA1_Update(blk_SHA_CTX *ctx, const void *data, unsigned long len) +void blk_SHA1_Update(blk_SHA_CTX *ctx, const void *data, size_t len) { unsigned int lenW = ctx->size & 63; diff --git a/block-sha1/sha1.h b/block-sha1/sha1.h index 4df6747752..9fb0441b98 100644 --- a/block-sha1/sha1.h +++ b/block-sha1/sha1.h @@ -13,7 +13,7 @@ typedef struct { } blk_SHA_CTX; void blk_SHA1_Init(blk_SHA_CTX *ctx); -void blk_SHA1_Update(blk_SHA_CTX *ctx, const void *dataIn, unsigned long len); +void blk_SHA1_Update(blk_SHA_CTX *ctx, const void *dataIn, size_t len); void blk_SHA1_Final(unsigned char hashout[20], blk_SHA_CTX *ctx); #define platform_SHA_CTX blk_SHA_CTX From patchwork Fri Nov 13 05:07:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 11902507 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0592C4742C for ; Fri, 13 Nov 2020 05:07:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9A37620B80 for ; Fri, 13 Nov 2020 05:07:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726215AbgKMFHU (ORCPT ); Fri, 13 Nov 2020 00:07:20 -0500 Received: from cloud.peff.net ([104.130.231.41]:56922 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726054AbgKMFHU (ORCPT ); Fri, 13 Nov 2020 00:07:20 -0500 Received: (qmail 23776 invoked by uid 109); 13 Nov 2020 05:07:20 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Fri, 13 Nov 2020 05:07:20 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 6214 invoked by uid 111); 13 Nov 2020 05:07:19 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 13 Nov 2020 00:07:19 -0500 Authentication-Results: peff.net; auth=none Date: Fri, 13 Nov 2020 00:07:19 -0500 From: Jeff King To: git@vger.kernel.org Subject: [PATCH 5/5] packfile: detect overflow in .idx file size checks Message-ID: <20201113050719.GE744691@coredump.intra.peff.net> References: <20201113050631.GA744608@coredump.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20201113050631.GA744608@coredump.intra.peff.net> Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In load_idx(), we check that the .idx file is sized appropriately for the number of objects it claims to have. We recently fixed the case where the number of objects caused our expected size to overflow a 32-bit unsigned int, and we switched to size_t. On a 64-bit system, this is fine; our size_t covers any expected size. On a 32-bit system, though, it won't. The file may claim to have 2^31 objects, which will overflow even a size_t. This doesn't hurt us at all for a well-formed idx file. A 32-bit system would already have failed to mmap such a file, since it would be too big. But an .idx file which _claims_ to have 2^31 objects but is actually much smaller would fool our check. This is a broken file, and for the most part we don't care that much what happens. But: - it's a little friendlier to notice up front "woah, this file is broken" than it is to get nonsense results - later access of the data assumes that the loading function sanity-checked that we have at least enough bytes for the regular object-id table. A malformed .idx file could lead to an out-of-bounds read. So let's use our overflow-checking functions to make sure that we're not fooled by a malformed file. Signed-off-by: Jeff King --- packfile.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/packfile.c b/packfile.c index 63fe9ee8be..9702b1218b 100644 --- a/packfile.c +++ b/packfile.c @@ -148,7 +148,7 @@ int load_idx(const char *path, const unsigned int hashsz, void *idx_map, * - hash of the packfile * - file checksum */ - if (idx_size != 4 * 256 + (size_t)nr * (hashsz + 4) + hashsz + hashsz) + if (idx_size != st_add(4 * 256 + hashsz + hashsz, st_mult(nr, hashsz + 4))) return error("wrong index v1 file size in %s", path); } else if (version == 2) { /* @@ -164,10 +164,10 @@ int load_idx(const char *path, const unsigned int hashsz, void *idx_map, * variable sized table containing 8-byte entries * for offsets larger than 2^31. */ - size_t min_size = 8 + 4*256 + (size_t)nr*(hashsz + 4 + 4) + hashsz + hashsz; + size_t min_size = st_add(8 + 4*256 + hashsz + hashsz, st_mult(nr, hashsz + 4 + 4)); size_t max_size = min_size; if (nr) - max_size += ((size_t)nr - 1)*8; + max_size = st_add(max_size, st_mult(nr - 1, 8)); if (idx_size < min_size || idx_size > max_size) return error("wrong index v2 file size in %s", path); if (idx_size != min_size &&