From patchwork Thu Nov 9 07:09:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13450680 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E51CD4C7F for ; Thu, 9 Nov 2023 07:09:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 58829211D for ; Wed, 8 Nov 2023 23:09:49 -0800 (PST) Received: (qmail 25253 invoked by uid 109); 9 Nov 2023 07:09:48 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 09 Nov 2023 07:09:48 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 20460 invoked by uid 111); 9 Nov 2023 07:09:52 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 09 Nov 2023 02:09:52 -0500 Authentication-Results: peff.net; auth=none Date: Thu, 9 Nov 2023 02:09:48 -0500 From: Jeff King To: git@vger.kernel.org Cc: Taylor Blau Subject: [PATCH 1/9] commit-graph: handle overflow in chunk_size checks Message-ID: <20231109070948.GA2698043@coredump.intra.peff.net> References: <20231109070310.GA2697602@coredump.intra.peff.net> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20231109070310.GA2697602@coredump.intra.peff.net> We check the size of chunks with fixed records by multiplying the width of each record by the number of commits in the file. Like: if (chunk_size != g->num_commits * GRAPH_DATA_WIDTH) If this multiplication overflows, we may not notice a chunk is too small (which could later lead to out-of-bound reads). In the current code this is only possible for the CDAT chunk, but the reasons are quite subtle. We compute g->num_commits by dividing the size of the OIDL chunk by the hash length (since it consists of a bunch of hashes). So we know that any size_t multiplication that uses a value smaller than the hash length cannot overflow. And the CDAT records are the only ones that are larger (the others are just 4-byte records). So it's worth fixing all of these, to make it clear that they're not subject to overflow (without having to reason about seemingly unrelated code). The obvious thing to do is add an st_mult(), like: if (chunk_size != st_mult(g->num_commits, GRAPH_DATA_WIDTH)) And that certainly works, but it has one downside: if we detect an overflow, we'll immediately die(). But the commit graph is an optional file; if we run into other problems loading it, we'll generally return an error and fall back to accessing the full objects. Using st_mult() means a malformed file will abort the whole process. So instead, we can do a division like this: if (chunk_size / GRAPH_DATA_WIDTH != g->num_commits) where there's no possibility of overflow. We do lose a little bit of precision; due to integer division truncation we'd allow up to an extra GRAPH_DATA_WIDTH-1 bytes of data in the chunk. That's OK. Our main goal here is making sure we don't have too _few_ bytes, which would cause an out-of-bounds read (we could actually replace our "!=" with "<", but I think it's worth being a little pedantic, as a large mismatch could be a sign of other problems). I didn't add a test here. We'd need to generate a very large graph file in order to get g->num_commits large enough to cause an overflow. And a later patch in this series will use this same division technique in a way that is much easier to trigger in the tests. Signed-off-by: Jeff King --- There are still several st_mult() calls within commit-graph.c, unrelated to my jk/chunk-bounds series. I suspect they're all redundant, as the chunk-size checks give a stricter bound. But checking and removing them is a separate topic. I think the midx code has similar st_mult() problems, but it's quite happy to die() on any error already. So I've left auditing that for another day. Another alternative to the division is introducing an st_mult() variant like: if (!st_mult(&out, &a, &b)) return error(...); We may want to go there in the long run as part of lib-ification, but I didn't want to get into it for this series. commit-graph.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index ee66098e07..5d7d7a89e5 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -344,7 +344,7 @@ static int graph_read_commit_data(const unsigned char *chunk_start, size_t chunk_size, void *data) { struct commit_graph *g = data; - if (chunk_size != g->num_commits * GRAPH_DATA_WIDTH) + if (chunk_size / GRAPH_DATA_WIDTH != g->num_commits) return error("commit-graph commit data chunk is wrong size"); g->chunk_commit_data = chunk_start; return 0; @@ -354,7 +354,7 @@ static int graph_read_generation_data(const unsigned char *chunk_start, size_t chunk_size, void *data) { struct commit_graph *g = data; - if (chunk_size != g->num_commits * sizeof(uint32_t)) + if (chunk_size / sizeof(uint32_t) != g->num_commits) return error("commit-graph generations chunk is wrong size"); g->chunk_generation_data = chunk_start; return 0; @@ -364,7 +364,7 @@ static int graph_read_bloom_index(const unsigned char *chunk_start, size_t chunk_size, void *data) { struct commit_graph *g = data; - if (chunk_size != g->num_commits * 4) { + if (chunk_size / 4 != g->num_commits) { warning("commit-graph changed-path index chunk is too small"); return -1; } From patchwork Thu Nov 9 07:12:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13450685 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 84534D2FE for ; Thu, 9 Nov 2023 07:12:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C81252590 for ; Wed, 8 Nov 2023 23:12:08 -0800 (PST) Received: (qmail 25292 invoked by uid 109); 9 Nov 2023 07:12:08 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 09 Nov 2023 07:12:08 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 20580 invoked by uid 111); 9 Nov 2023 07:12:12 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 09 Nov 2023 02:12:12 -0500 Authentication-Results: peff.net; auth=none Date: Thu, 9 Nov 2023 02:12:07 -0500 From: Jeff King To: git@vger.kernel.org Cc: Taylor Blau Subject: [PATCH 2/9] midx: check consistency of fanout table Message-ID: <20231109071207.GB2698043@coredump.intra.peff.net> References: <20231109070310.GA2697602@coredump.intra.peff.net> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20231109070310.GA2697602@coredump.intra.peff.net> The commit-graph, midx, and pack idx on-disk formats all have oid fanout tables which are fed to bsearch_hash(). If these tables do not increase monotonically, then the binary search may not only produce bogus values, it may cause out of bounds reads. We fixed this for commit graphs in 4169d89645 (commit-graph: check consistency of fanout table, 2023-10-09). That commit argued that we did not need to do the same for midx and pack idx files, because they already did this check. However, that is wrong. We _do_ check the fanout table for pack idx files when we load them, but we only do so for midx files when running "git multi-pack-index verify". So it is possible to get an out-of-bounds read by running a normal command with a specially crafted midx file. Let's fix this using the same solution (and roughly the same test) we did for the commit-graph in 4169d89645. This replaces the same check from "multi-pack-index verify", because verify uses the same read routines, we'd bail on reading the midx much sooner now. So let's make sure to copy its verbose error message. Signed-off-by: Jeff King --- midx.c | 20 +++++++++++--------- t/t5319-multi-pack-index.sh | 14 ++++++++++++++ 2 files changed, 25 insertions(+), 9 deletions(-) diff --git a/midx.c b/midx.c index 2f3863c936..1d14661dad 100644 --- a/midx.c +++ b/midx.c @@ -64,13 +64,24 @@ void get_midx_rev_filename(struct strbuf *out, struct multi_pack_index *m) static int midx_read_oid_fanout(const unsigned char *chunk_start, size_t chunk_size, void *data) { + int i; struct multi_pack_index *m = data; m->chunk_oid_fanout = (uint32_t *)chunk_start; if (chunk_size != 4 * 256) { error(_("multi-pack-index OID fanout is of the wrong size")); return 1; } + for (i = 0; i < 255; i++) { + uint32_t oid_fanout1 = ntohl(m->chunk_oid_fanout[i]); + uint32_t oid_fanout2 = ntohl(m->chunk_oid_fanout[i+1]); + + if (oid_fanout1 > oid_fanout2) { + error(_("oid fanout out of order: fanout[%d] = %"PRIx32" > %"PRIx32" = fanout[%d]"), + i, oid_fanout1, oid_fanout2, i + 1); + return 1; + } + } m->num_objects = ntohl(m->chunk_oid_fanout[255]); return 0; } @@ -1782,15 +1793,6 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag } stop_progress(&progress); - for (i = 0; i < 255; i++) { - uint32_t oid_fanout1 = ntohl(m->chunk_oid_fanout[i]); - uint32_t oid_fanout2 = ntohl(m->chunk_oid_fanout[i + 1]); - - if (oid_fanout1 > oid_fanout2) - midx_report(_("oid fanout out of order: fanout[%d] = %"PRIx32" > %"PRIx32" = fanout[%d]"), - i, oid_fanout1, oid_fanout2, i + 1); - } - if (m->num_objects == 0) { midx_report(_("the midx contains no oid")); /* diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index c4c6060cee..c20aafe99a 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -1157,4 +1157,18 @@ test_expect_success 'reader notices too-small revindex chunk' ' test_cmp expect.err err ' +test_expect_success 'reader notices out-of-bounds fanout' ' + # This is similar to the out-of-bounds fanout test in t5318. The values + # in adjacent entries should be large but not identical (they + # are used as hi/lo starts for a binary search, which would then abort + # immediately). + corrupt_chunk OIDF 0 $(printf "%02x000000" $(test_seq 0 254)) && + test_must_fail git log 2>err && + cat >expect <<-\EOF && + error: oid fanout out of order: fanout[254] = fe000000 > 5c = fanout[255] + fatal: multi-pack-index required OID fanout chunk missing or corrupted + EOF + test_cmp expect err +' + test_done From patchwork Thu Nov 9 07:13:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13450686 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6F8176AA7 for ; Thu, 9 Nov 2023 07:13:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B076A2590 for ; Wed, 8 Nov 2023 23:13:13 -0800 (PST) Received: (qmail 25347 invoked by uid 109); 9 Nov 2023 07:13:13 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 09 Nov 2023 07:13:13 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 20605 invoked by uid 111); 9 Nov 2023 07:13:17 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 09 Nov 2023 02:13:17 -0500 Authentication-Results: peff.net; auth=none Date: Thu, 9 Nov 2023 02:13:12 -0500 From: Jeff King To: git@vger.kernel.org Cc: Taylor Blau Subject: [PATCH 3/9] commit-graph: drop redundant call to "lite" verification Message-ID: <20231109071312.GC2698043@coredump.intra.peff.net> References: <20231109070310.GA2697602@coredump.intra.peff.net> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20231109070310.GA2697602@coredump.intra.peff.net> The idea of verify_commit_graph_lite() is to have cheap verification checks both for everyday use of the graph files (to avoid out of bounds reads, etc) as well as for doing a full check via "commit-graph verify" (which will also check the hash, etc). But the expensive verification checks operate on a commit_graph struct, which we get by using the normal everyday-reader code! So any problem we'd find by calling it would have been found before we even got to the verify_one_commit_graph() function. Removing it simplifies the code a bit, but also frees us up to move the "lite" verification steps around within that everyday-reader code. Signed-off-by: Jeff King --- commit-graph.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 5d7d7a89e5..d9fc08de86 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -2690,10 +2690,6 @@ static int verify_one_commit_graph(struct repository *r, struct commit *seen_gen_zero = NULL; struct commit *seen_gen_non_zero = NULL; - verify_commit_graph_error = verify_commit_graph_lite(g); - if (verify_commit_graph_error) - return verify_commit_graph_error; - if (!commit_graph_checksum_valid(g)) { graph_report(_("the commit-graph file has incorrect checksum and is likely corrupt")); verify_commit_graph_error = VERIFY_COMMIT_GRAPH_ERROR_HASH; From patchwork Thu Nov 9 07:14:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13450687 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8CCC9DDAC for ; Thu, 9 Nov 2023 07:14:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 06B892590 for ; Wed, 8 Nov 2023 23:14:35 -0800 (PST) Received: (qmail 25363 invoked by uid 109); 9 Nov 2023 07:14:35 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 09 Nov 2023 07:14:35 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 20635 invoked by uid 111); 9 Nov 2023 07:14:39 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 09 Nov 2023 02:14:39 -0500 Authentication-Results: peff.net; auth=none Date: Thu, 9 Nov 2023 02:14:34 -0500 From: Jeff King To: git@vger.kernel.org Cc: Taylor Blau Subject: [PATCH 4/9] commit-graph: clarify missing-chunk error messages Message-ID: <20231109071434.GD2698043@coredump.intra.peff.net> References: <20231109070310.GA2697602@coredump.intra.peff.net> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20231109070310.GA2697602@coredump.intra.peff.net> When a required commit-graph chunk cannot be loaded, we leave its entry in the struct NULL, and then later complain that it is missing. But that's just one reason we might not have loaded it, as we also do some data quality checks. Let's switch these messages to say "missing or corrupted", which is exactly what the midx code says for the same cases. Likewise, we'll use the same phrasing and capitalization as those for consistency. And while we're here, we can mark them for translation (just like the midx ones). Signed-off-by: Jeff King --- I went with "corrupted" here for consistency with the others (versus "corrupt"). If we think there's a reason to prefer one over the other, I'm happy to take a patch on top fixing all of them. commit-graph.c | 6 +++--- t/t5318-commit-graph.sh | 10 +++++----- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index d9fc08de86..989ebbe816 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -292,15 +292,15 @@ static int verify_commit_graph_lite(struct commit_graph *g) * itself. */ if (!g->chunk_oid_fanout) { - error("commit-graph is missing the OID Fanout chunk"); + error(_("commit-graph required OID fanout chunk missing or corrupted")); return 1; } if (!g->chunk_oid_lookup) { - error("commit-graph is missing the OID Lookup chunk"); + error(_("commit-graph required OID lookup chunk missing or corrupted")); return 1; } if (!g->chunk_commit_data) { - error("commit-graph is missing the Commit Data chunk"); + error(_("commit-graph required commit data chunk missing or corrupted")); return 1; } diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index d4fc65e078..affb959d64 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -540,17 +540,17 @@ test_expect_success 'detect low chunk count' ' test_expect_success 'detect missing OID fanout chunk' ' corrupt_graph_and_verify $GRAPH_BYTE_OID_FANOUT_ID "\0" \ - "missing the OID Fanout chunk" + "commit-graph required OID fanout chunk missing or corrupted" ' test_expect_success 'detect missing OID lookup chunk' ' corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_ID "\0" \ - "missing the OID Lookup chunk" + "commit-graph required OID lookup chunk missing or corrupted" ' test_expect_success 'detect missing commit data chunk' ' corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_DATA_ID "\0" \ - "missing the Commit Data chunk" + "commit-graph required commit data chunk missing or corrupted" ' test_expect_success 'detect incorrect fanout' ' @@ -842,7 +842,7 @@ test_expect_success 'reader notices too-small oid fanout chunk' ' check_corrupt_chunk OIDF clear $(printf "000000%02x" $(test_seq 250)) && cat >expect.err <<-\EOF && error: commit-graph oid fanout chunk is wrong size - error: commit-graph is missing the OID Fanout chunk + error: commit-graph required OID fanout chunk missing or corrupted EOF test_cmp expect.err err ' @@ -874,7 +874,7 @@ test_expect_success 'reader notices too-small commit data chunk' ' check_corrupt_chunk CDAT clear 00000000 && cat >expect.err <<-\EOF && error: commit-graph commit data chunk is wrong size - error: commit-graph is missing the Commit Data chunk + error: commit-graph required commit data chunk missing or corrupted EOF test_cmp expect.err err ' From patchwork Thu Nov 9 07:17:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13450688 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 20F50D2FE for ; Thu, 9 Nov 2023 07:17:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6AE7093 for ; Wed, 8 Nov 2023 23:17:12 -0800 (PST) Received: (qmail 25399 invoked by uid 109); 9 Nov 2023 07:17:11 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 09 Nov 2023 07:17:11 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 20729 invoked by uid 111); 9 Nov 2023 07:17:15 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 09 Nov 2023 02:17:15 -0500 Authentication-Results: peff.net; auth=none Date: Thu, 9 Nov 2023 02:17:11 -0500 From: Jeff King To: git@vger.kernel.org Cc: Taylor Blau Subject: [PATCH 5/9] commit-graph: abort as soon as we see a bogus chunk Message-ID: <20231109071711.GE2698043@coredump.intra.peff.net> References: <20231109070310.GA2697602@coredump.intra.peff.net> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20231109070310.GA2697602@coredump.intra.peff.net> The code to read commit-graph files tries to read all of the required chunks, but doesn't abort if we can't find one (or if it's corrupted). It's only at the end of reading the file that we then do some sanity checks for NULL entries. But it's preferable to detect the errors and bail immediately, for a few reasons: 1. It's less error-prone. It's easy in the reader functions to flag an error but still end up setting some struct fields (an error I in fact made while working on this patch series). 2. It's safer. Since verifying some chunks depends on the values of other chunks, we may be depending on not-yet-verified data. I don't know offhand of any case where this can cause problems, but it's one less subtle thing to worry about in the reader code. 3. It prevents the user from seeing nonsense errors. If we're missing an OIDL chunk, then g->num_commits will be zero. And so we may complain that the size of our CDAT chunk (which should have a fixed-size record for each commit) is wrong unless it's also zero. But that's misleading; the problem is the missing OIDL chunk; the CDAT one might be fine! So let's just check the return value from read_chunk(). This is exactly how the midx chunk-reading code does it. Signed-off-by: Jeff King --- commit-graph.c | 29 +++++++++++++---------------- 1 file changed, 13 insertions(+), 16 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 989ebbe816..374575b484 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -291,19 +291,6 @@ static int verify_commit_graph_lite(struct commit_graph *g) * over g->num_commits, or runs a checksum on the commit-graph * itself. */ - if (!g->chunk_oid_fanout) { - error(_("commit-graph required OID fanout chunk missing or corrupted")); - return 1; - } - if (!g->chunk_oid_lookup) { - error(_("commit-graph required OID lookup chunk missing or corrupted")); - return 1; - } - if (!g->chunk_commit_data) { - error(_("commit-graph required commit data chunk missing or corrupted")); - return 1; - } - for (i = 0; i < 255; i++) { uint32_t oid_fanout1 = ntohl(g->chunk_oid_fanout[i]); uint32_t oid_fanout2 = ntohl(g->chunk_oid_fanout[i + 1]); @@ -462,9 +449,19 @@ struct commit_graph *parse_commit_graph(struct repo_settings *s, GRAPH_HEADER_SIZE, graph->num_chunks, 1)) goto free_and_return; - read_chunk(cf, GRAPH_CHUNKID_OIDFANOUT, graph_read_oid_fanout, graph); - read_chunk(cf, GRAPH_CHUNKID_OIDLOOKUP, graph_read_oid_lookup, graph); - read_chunk(cf, GRAPH_CHUNKID_DATA, graph_read_commit_data, graph); + if (read_chunk(cf, GRAPH_CHUNKID_OIDFANOUT, graph_read_oid_fanout, graph)) { + error(_("commit-graph required OID fanout chunk missing or corrupted")); + goto free_and_return; + } + if (read_chunk(cf, GRAPH_CHUNKID_OIDLOOKUP, graph_read_oid_lookup, graph)) { + error(_("commit-graph required OID lookup chunk missing or corrupted")); + goto free_and_return; + } + if (read_chunk(cf, GRAPH_CHUNKID_DATA, graph_read_commit_data, graph)) { + error(_("commit-graph required commit data chunk missing or corrupted")); + goto free_and_return; + } + pair_chunk(cf, GRAPH_CHUNKID_EXTRAEDGES, &graph->chunk_extra_edges, &graph->chunk_extra_edges_size); pair_chunk(cf, GRAPH_CHUNKID_BASE, &graph->chunk_base_graphs, From patchwork Thu Nov 9 07:24:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13450689 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E9D4DD532 for ; Thu, 9 Nov 2023 07:24:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 49EE42126 for ; Wed, 8 Nov 2023 23:24:37 -0800 (PST) Received: (qmail 25484 invoked by uid 109); 9 Nov 2023 07:24:36 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 09 Nov 2023 07:24:36 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 20872 invoked by uid 111); 9 Nov 2023 07:24:40 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 09 Nov 2023 02:24:40 -0500 Authentication-Results: peff.net; auth=none Date: Thu, 9 Nov 2023 02:24:35 -0500 From: Jeff King To: git@vger.kernel.org Cc: Taylor Blau Subject: [PATCH 6/9] commit-graph: use fanout value for graph size Message-ID: <20231109072435.GF2698043@coredump.intra.peff.net> References: <20231109070310.GA2697602@coredump.intra.peff.net> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20231109070310.GA2697602@coredump.intra.peff.net> Commit-graph, midx, and pack idx files all have both a lookup table of oids and an oid fanout table. In midx and pack idx files, we take the final entry of the fanout table as the source of truth for the number of entries, and then verify that the size of the lookup table matches that. But for commit-graph files, we do the opposite: we use the size of the lookup table as the source of truth, and then check the final fanout entry against it. As noted in 4169d89645 (commit-graph: check consistency of fanout table, 2023-10-09), either is correct. But there are a few reasons to prefer the fanout table as the source of truth: 1. The fanout entries are 32-bits on disk, and that defines the maximum number of entries we can store. But since the size of the lookup table is only bounded by the filesystem, it can be much larger. And hence computing it as the commit-graph does means that we may truncate the result when storing it in a uint32_t. 2. We read the fanout first, then the lookup table. If we're verifying the chunks as we read them, then we'd want to take the fanout as truth (we have nothing yet to check it against) and then we can check that the lookup table matches what we already know. 3. It is pointlessly inconsistent with the midx and pack idx code. Since the three have to do similar size and bounds checks, it is easier to reason about all three if they use the same approach. So this patch moves the assignment of g->num_commits to the fanout parser, and then we can check the size of the lookup chunk as soon as we try to load it. There's already a test covering this situation, which munges the final fanout entry to 2^32-1. In the current code we complain that it does not agree with the table size. But now that we treat the munged value as the source of truth, we'll complain that the lookup table is the wrong size (again, either is correct). So we'll have to update the message we expect (and likewise for an earlier test which does similar munging). There's a similar test for this situation on the midx side, but rather than making a very-large fanout value, it just truncates the lookup table. We could do that here, too, but the very-large fanout value actually shows an interesting corner case. On a 32-bit system, multiplying to find the expected table size would cause an integer overflow. Using st_mult() would detect that, but cause us to die() rather than falling back to the non-graph code path. Checking the size using division (as we do with existing chunk-size checks) avoids the overflow entirely, and the test demonstrates this when run on a 32-bit system. Signed-off-by: Jeff King --- commit-graph.c | 8 +++----- t/t5318-commit-graph.sh | 5 +++-- 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 374575b484..094814c2ba 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -300,10 +300,6 @@ static int verify_commit_graph_lite(struct commit_graph *g) return 1; } } - if (ntohl(g->chunk_oid_fanout[255]) != g->num_commits) { - error("commit-graph oid table and fanout disagree on size"); - return 1; - } return 0; } @@ -315,6 +311,7 @@ static int graph_read_oid_fanout(const unsigned char *chunk_start, if (chunk_size != 256 * sizeof(uint32_t)) return error("commit-graph oid fanout chunk is wrong size"); g->chunk_oid_fanout = (const uint32_t *)chunk_start; + g->num_commits = ntohl(g->chunk_oid_fanout[255]); return 0; } @@ -323,7 +320,8 @@ static int graph_read_oid_lookup(const unsigned char *chunk_start, { struct commit_graph *g = data; g->chunk_oid_lookup = chunk_start; - g->num_commits = chunk_size / g->hash_len; + if (chunk_size / g->hash_len != g->num_commits) + return error(_("commit-graph OID lookup chunk is the wrong size")); return 0; } diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index affb959d64..8bd7fcefb5 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -560,7 +560,7 @@ test_expect_success 'detect incorrect fanout' ' test_expect_success 'detect incorrect fanout final value' ' corrupt_graph_and_verify $GRAPH_BYTE_FANOUT2 "\01" \ - "oid table and fanout disagree on size" + "OID lookup chunk is the wrong size" ' test_expect_success 'detect incorrect OID order' ' @@ -850,7 +850,8 @@ test_expect_success 'reader notices too-small oid fanout chunk' ' test_expect_success 'reader notices fanout/lookup table mismatch' ' check_corrupt_chunk OIDF 1020 "FFFFFFFF" && cat >expect.err <<-\EOF && - error: commit-graph oid table and fanout disagree on size + error: commit-graph OID lookup chunk is the wrong size + error: commit-graph required OID lookup chunk missing or corrupted EOF test_cmp expect.err err ' From patchwork Thu Nov 9 07:25:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13450690 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 53B512594 for ; Thu, 9 Nov 2023 07:25:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 91E4E2126 for ; Wed, 8 Nov 2023 23:25:08 -0800 (PST) Received: (qmail 25498 invoked by uid 109); 9 Nov 2023 07:25:07 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 09 Nov 2023 07:25:07 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 20894 invoked by uid 111); 9 Nov 2023 07:25:11 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 09 Nov 2023 02:25:11 -0500 Authentication-Results: peff.net; auth=none Date: Thu, 9 Nov 2023 02:25:07 -0500 From: Jeff King To: git@vger.kernel.org Cc: Taylor Blau Subject: [PATCH 7/9] commit-graph: check order while reading fanout chunk Message-ID: <20231109072507.GG2698043@coredump.intra.peff.net> References: <20231109070310.GA2697602@coredump.intra.peff.net> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20231109070310.GA2697602@coredump.intra.peff.net> We read the fanout chunk, storing a pointer to it, but only confirm that the entries are monotonic in a final "lite" verification step. Let's move that into the actual OIDF chunk callback, so that we can report problems immediately (for all the reasons given in the previous "commit-graph: abort as soon as we see a bogus chunk" commit). Signed-off-by: Jeff King --- commit-graph.c | 25 +++++++++++++------------ t/t5318-commit-graph.sh | 1 + 2 files changed, 14 insertions(+), 12 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 094814c2ba..4ba523cd15 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -277,8 +277,6 @@ struct commit_graph *load_commit_graph_one_fd_st(struct repository *r, static int verify_commit_graph_lite(struct commit_graph *g) { - int i; - /* * Basic validation shared between parse_commit_graph() * which'll be called every time the graph is used, and the @@ -291,27 +289,30 @@ static int verify_commit_graph_lite(struct commit_graph *g) * over g->num_commits, or runs a checksum on the commit-graph * itself. */ - for (i = 0; i < 255; i++) { - uint32_t oid_fanout1 = ntohl(g->chunk_oid_fanout[i]); - uint32_t oid_fanout2 = ntohl(g->chunk_oid_fanout[i + 1]); - - if (oid_fanout1 > oid_fanout2) { - error("commit-graph fanout values out of order"); - return 1; - } - } - return 0; } static int graph_read_oid_fanout(const unsigned char *chunk_start, size_t chunk_size, void *data) { struct commit_graph *g = data; + int i; + if (chunk_size != 256 * sizeof(uint32_t)) return error("commit-graph oid fanout chunk is wrong size"); g->chunk_oid_fanout = (const uint32_t *)chunk_start; g->num_commits = ntohl(g->chunk_oid_fanout[255]); + + for (i = 0; i < 255; i++) { + uint32_t oid_fanout1 = ntohl(g->chunk_oid_fanout[i]); + uint32_t oid_fanout2 = ntohl(g->chunk_oid_fanout[i + 1]); + + if (oid_fanout1 > oid_fanout2) { + error("commit-graph fanout values out of order"); + return 1; + } + } + return 0; } diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index 8bd7fcefb5..7fe7c72a87 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -867,6 +867,7 @@ test_expect_success 'reader notices out-of-bounds fanout' ' check_corrupt_chunk OIDF 0 $(printf "%02x000000" $(test_seq 0 254)) && cat >expect.err <<-\EOF && error: commit-graph fanout values out of order + error: commit-graph required OID fanout chunk missing or corrupted EOF test_cmp expect.err err ' From patchwork Thu Nov 9 07:25:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13450691 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A123A3217 for ; Thu, 9 Nov 2023 07:25:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 155981BFD for ; Wed, 8 Nov 2023 23:25:25 -0800 (PST) Received: (qmail 25512 invoked by uid 109); 9 Nov 2023 07:25:25 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 09 Nov 2023 07:25:25 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 20901 invoked by uid 111); 9 Nov 2023 07:25:29 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 09 Nov 2023 02:25:29 -0500 Authentication-Results: peff.net; auth=none Date: Thu, 9 Nov 2023 02:25:24 -0500 From: Jeff King To: git@vger.kernel.org Cc: Taylor Blau Subject: [PATCH 8/9] commit-graph: drop verify_commit_graph_lite() Message-ID: <20231109072524.GH2698043@coredump.intra.peff.net> References: <20231109070310.GA2697602@coredump.intra.peff.net> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20231109070310.GA2697602@coredump.intra.peff.net> As we've moved all of the checks from this function directly into the chunk-reading code used by the caller (and there is only one caller), we can just drop it entirely. Signed-off-by: Jeff King --- commit-graph.c | 20 -------------------- 1 file changed, 20 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 4ba523cd15..6fbfe4c68e 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -275,23 +275,6 @@ struct commit_graph *load_commit_graph_one_fd_st(struct repository *r, return ret; } -static int verify_commit_graph_lite(struct commit_graph *g) -{ - /* - * Basic validation shared between parse_commit_graph() - * which'll be called every time the graph is used, and the - * much more expensive verify_commit_graph() used by - * "commit-graph verify". - * - * There should only be very basic checks here to ensure that - * we don't e.g. segfault in fill_commit_in_graph(), but - * because this is a very hot codepath nothing that e.g. loops - * over g->num_commits, or runs a checksum on the commit-graph - * itself. - */ - return 0; -} - static int graph_read_oid_fanout(const unsigned char *chunk_start, size_t chunk_size, void *data) { @@ -495,9 +478,6 @@ struct commit_graph *parse_commit_graph(struct repo_settings *s, oidread(&graph->oid, graph->data + graph->data_len - graph->hash_len); - if (verify_commit_graph_lite(graph)) - goto free_and_return; - free_chunkfile(cf); return graph; From patchwork Thu Nov 9 07:26:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13450692 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 78346DDB8 for ; Thu, 9 Nov 2023 07:26:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9068C1BD7 for ; Wed, 8 Nov 2023 23:26:29 -0800 (PST) Received: (qmail 25529 invoked by uid 109); 9 Nov 2023 07:26:28 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 09 Nov 2023 07:26:28 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 20914 invoked by uid 111); 9 Nov 2023 07:26:32 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 09 Nov 2023 02:26:32 -0500 Authentication-Results: peff.net; auth=none Date: Thu, 9 Nov 2023 02:26:28 -0500 From: Jeff King To: git@vger.kernel.org Cc: Taylor Blau Subject: [PATCH 9/9] commit-graph: mark chunk error messages for translation Message-ID: <20231109072628.GI2698043@coredump.intra.peff.net> References: <20231109070310.GA2697602@coredump.intra.peff.net> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20231109070310.GA2697602@coredump.intra.peff.net> The patches from f32af12cee (Merge branch 'jk/chunk-bounds', 2023-10-23) added many new untranslated error messages. While it's unlikely for most users to see these messages at all, most of the other commit-graph error messages are translated (and likewise for the matching midx messages). Let's mark them all for consistency (and to help any poor unfortunate user who does manage to find a broken graph file). Signed-off-by: Jeff King --- The "wrong size" ones may be dropped eventually if we have a generic pair_chunk_expect() API, but it seemed easier to just fix them all mechanically in one go. commit-graph.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 6fbfe4c68e..acac9bf6e1 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -282,7 +282,7 @@ static int graph_read_oid_fanout(const unsigned char *chunk_start, int i; if (chunk_size != 256 * sizeof(uint32_t)) - return error("commit-graph oid fanout chunk is wrong size"); + return error(_("commit-graph oid fanout chunk is wrong size")); g->chunk_oid_fanout = (const uint32_t *)chunk_start; g->num_commits = ntohl(g->chunk_oid_fanout[255]); @@ -291,7 +291,7 @@ static int graph_read_oid_fanout(const unsigned char *chunk_start, uint32_t oid_fanout2 = ntohl(g->chunk_oid_fanout[i + 1]); if (oid_fanout1 > oid_fanout2) { - error("commit-graph fanout values out of order"); + error(_("commit-graph fanout values out of order")); return 1; } } @@ -314,7 +314,7 @@ static int graph_read_commit_data(const unsigned char *chunk_start, { struct commit_graph *g = data; if (chunk_size / GRAPH_DATA_WIDTH != g->num_commits) - return error("commit-graph commit data chunk is wrong size"); + return error(_("commit-graph commit data chunk is wrong size")); g->chunk_commit_data = chunk_start; return 0; } @@ -324,7 +324,7 @@ static int graph_read_generation_data(const unsigned char *chunk_start, { struct commit_graph *g = data; if (chunk_size / sizeof(uint32_t) != g->num_commits) - return error("commit-graph generations chunk is wrong size"); + return error(_("commit-graph generations chunk is wrong size")); g->chunk_generation_data = chunk_start; return 0; } @@ -334,7 +334,7 @@ static int graph_read_bloom_index(const unsigned char *chunk_start, { struct commit_graph *g = data; if (chunk_size / 4 != g->num_commits) { - warning("commit-graph changed-path index chunk is too small"); + warning(_("commit-graph changed-path index chunk is too small")); return -1; } g->chunk_bloom_indexes = chunk_start; @@ -348,8 +348,8 @@ static int graph_read_bloom_data(const unsigned char *chunk_start, uint32_t hash_version; if (chunk_size < BLOOMDATA_CHUNK_HEADER_SIZE) { - warning("ignoring too-small changed-path chunk" - " (%"PRIuMAX" < %"PRIuMAX") in commit-graph file", + warning(_("ignoring too-small changed-path chunk" + " (%"PRIuMAX" < %"PRIuMAX") in commit-graph file"), (uintmax_t)chunk_size, (uintmax_t)BLOOMDATA_CHUNK_HEADER_SIZE); return -1; @@ -605,7 +605,7 @@ int open_commit_graph_chain(const char *chain_file, /* treat empty files the same as missing */ errno = ENOENT; } else { - warning("commit-graph chain file too small"); + warning(_("commit-graph chain file too small")); errno = EINVAL; } return 0; @@ -946,7 +946,7 @@ static int fill_commit_in_graph(struct repository *r, parent_data_pos = edge_value & GRAPH_EDGE_LAST_MASK; do { if (g->chunk_extra_edges_size / sizeof(uint32_t) <= parent_data_pos) { - error("commit-graph extra-edges pointer out of bounds"); + error(_("commit-graph extra-edges pointer out of bounds")); free_commit_list(item->parents); item->parents = NULL; item->object.parsed = 0;