From patchwork Tue Sep 26 05:56:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13398681 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3A2AE7D0C5 for ; Tue, 26 Sep 2023 05:57:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233704AbjIZF5I (ORCPT ); Tue, 26 Sep 2023 01:57:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35922 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233689AbjIZF5F (ORCPT ); Tue, 26 Sep 2023 01:57:05 -0400 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B3712F3 for ; Mon, 25 Sep 2023 22:56:58 -0700 (PDT) Received: (qmail 14755 invoked by uid 109); 26 Sep 2023 05:56:58 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Tue, 26 Sep 2023 05:56:58 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 21426 invoked by uid 111); 26 Sep 2023 05:56:59 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Tue, 26 Sep 2023 01:56:59 -0400 Authentication-Results: peff.net; auth=none Date: Tue, 26 Sep 2023 01:56:57 -0400 From: Jeff King To: git@vger.kernel.org Cc: Taylor Blau , Derrick Stolee Subject: [PATCH 1/6] commit-graph: factor out chain opening function Message-ID: <20230926055657.GA1341418@coredump.intra.peff.net> References: <20230926055452.GA1341109@coredump.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20230926055452.GA1341109@coredump.intra.peff.net> Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The load_commit_graph_chain() function opens the chain file and all of the slices of graph that it points to. If there is no chain file (which is a totally normal condition), we return NULL. But if we run into errors with the chain file or loading the actual graph data, we also return NULL, and the caller cannot tell the difference. The caller can check for ENOENT for the unremarkable "no such file" case. But I'm hesitant to assume that the rest of the function would never accidentally set errno to ENOENT itself, since it is opening the slice files (and that would mean the caller fails to notice a real error). So let's break this into two functions: one to open the file, and one to actually load it. This matches the interface we provide for the non-chain graph file, which will also come in handy in a moment when we fix some bugs in the "git commit-graph verify" code. Some notes: - I've kept the "1 is good, 0 is bad" return convention (and the weird "fd" out-parameter) used by the matching open_commit_graph() function and other parts of the commit-graph code. This is unlike most of the rest of Git (which would just return the fd, with -1 for error), but it makes sense to stay consistent with the adjacent bits of the API here. - The existing chain loading function will quietly return if the file is too small to hold a single entry. I've retained that behavior (and explicitly set ENOENT in the opener function) for now, under the notion that it's probably valid (though I'd imagine unusual) to have an empty chain file. There are two small behavior changes here, but I think both are strictly positive: 1. The original blindly did a stat() before checking if fopen() succeeded, meaning we were making a pointless extra stat call. 2. We now use fstat() to check the file size. The previous code using a regular stat() on the pathname meant we could technically race with somebody updating the chain file, and end up with a size that does not match what we just opened with fopen(). I doubt anybody ever hit this in practice, but it may have caused an out-of-bounds read. We'll retain the load_commit_graph_chain() function which does both the open and reading steps (most existing callers do not care about seeing errors anyway, since loading commit-graphs is optimistic). Signed-off-by: Jeff King --- commit-graph.c | 58 +++++++++++++++++++++++++++++++++----------------- commit-graph.h | 3 +++ 2 files changed, 42 insertions(+), 19 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 5e8a3a5085..12cdd9af8e 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -513,31 +513,34 @@ static int add_graph_to_chain(struct commit_graph *g, return 1; } -static struct commit_graph *load_commit_graph_chain(struct repository *r, - struct object_directory *odb) +int open_commit_graph_chain(const char *chain_file, + int *fd, struct stat *st) +{ + *fd = git_open(chain_file); + if (*fd < 0) + return 0; + if (fstat(*fd, st)) { + close(*fd); + return 0; + } + if (st->st_size <= the_hash_algo->hexsz) { + close(*fd); + errno = ENOENT; + return 0; + } + return 1; +} + +struct commit_graph *load_commit_graph_chain_fd_st(struct repository *r, + int fd, struct stat *st) { struct commit_graph *graph_chain = NULL; struct strbuf line = STRBUF_INIT; - struct stat st; struct object_id *oids; int i = 0, valid = 1, count; - char *chain_name = get_commit_graph_chain_filename(odb); - FILE *fp; - int stat_res; + FILE *fp = xfdopen(fd, "r"); - fp = fopen(chain_name, "r"); - stat_res = stat(chain_name, &st); - free(chain_name); - - if (!fp) - return NULL; - if (stat_res || - st.st_size <= the_hash_algo->hexsz) { - fclose(fp); - return NULL; - } - - count = st.st_size / (the_hash_algo->hexsz + 1); + count = st->st_size / (the_hash_algo->hexsz + 1); CALLOC_ARRAY(oids, count); prepare_alt_odb(r); @@ -585,6 +588,23 @@ static struct commit_graph *load_commit_graph_chain(struct repository *r, return graph_chain; } +static struct commit_graph *load_commit_graph_chain(struct repository *r, + struct object_directory *odb) +{ + char *chain_file = get_commit_graph_chain_filename(odb); + struct stat st; + int fd; + struct commit_graph *g = NULL; + + if (open_commit_graph_chain(chain_file, &fd, &st)) { + /* ownership of fd is taken over by load function */ + g = load_commit_graph_chain_fd_st(r, fd, &st); + } + + free(chain_file); + return g; +} + /* * returns 1 if and only if all graphs in the chain have * corrected commit dates stored in the generation_data chunk. diff --git a/commit-graph.h b/commit-graph.h index 5e534f0fcc..3b662fd2b5 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -26,6 +26,7 @@ struct string_list; char *get_commit_graph_filename(struct object_directory *odb); char *get_commit_graph_chain_filename(struct object_directory *odb); int open_commit_graph(const char *graph_file, int *fd, struct stat *st); +int open_commit_graph_chain(const char *chain_file, int *fd, struct stat *st); /* * Given a commit struct, try to fill the commit struct info, including: @@ -105,6 +106,8 @@ struct commit_graph { struct commit_graph *load_commit_graph_one_fd_st(struct repository *r, int fd, struct stat *st, struct object_directory *odb); +struct commit_graph *load_commit_graph_chain_fd_st(struct repository *r, + int fd, struct stat *st); struct commit_graph *read_commit_graph_one(struct repository *r, struct object_directory *odb); From patchwork Tue Sep 26 05:57:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13398682 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3201EE7D0C5 for ; Tue, 26 Sep 2023 05:57:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233703AbjIZF6C (ORCPT ); Tue, 26 Sep 2023 01:58:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58904 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233629AbjIZF57 (ORCPT ); Tue, 26 Sep 2023 01:57:59 -0400 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2305A9D for ; Mon, 25 Sep 2023 22:57:53 -0700 (PDT) Received: (qmail 14765 invoked by uid 109); 26 Sep 2023 05:57:53 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Tue, 26 Sep 2023 05:57:53 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 21434 invoked by uid 111); 26 Sep 2023 05:57:54 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Tue, 26 Sep 2023 01:57:54 -0400 Authentication-Results: peff.net; auth=none Date: Tue, 26 Sep 2023 01:57:51 -0400 From: Jeff King To: git@vger.kernel.org Cc: Taylor Blau , Derrick Stolee Subject: [PATCH 2/6] commit-graph: check mixed generation validation when loading chain file Message-ID: <20230926055751.GB1341418@coredump.intra.peff.net> References: <20230926055452.GA1341109@coredump.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20230926055452.GA1341109@coredump.intra.peff.net> Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In read_commit_graph_one(), we call validate_mixed_generation_chain() after loading the graph. Even though we don't check the return value, this has the side effect of clearing the read_generation_data flag, which is important when working with mixed generation numbers. But doing this in load_commit_graph_chain_fd_st() makes more sense: 1. We are calling it even when we did not load a chain at all, which is pointless (you cannot have mixed generations in a single file). 2. For now, all callers load the graph via read_commit_graph_one(). But the point of factoring out the open/load in the previous commit was to let "commit-graph verify" call them separately. So it needs to trigger this function as part of the load. Without this patch, the mixed-generation tests in t5324 would start failing on "git commit-graph verify" calls, once we switch to using a separate open/load call there. Signed-off-by: Jeff King --- commit-graph.c | 54 +++++++++++++++++++++++++------------------------- 1 file changed, 27 insertions(+), 27 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 12cdd9af8e..8b29c6de24 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -473,6 +473,31 @@ static struct commit_graph *load_commit_graph_v1(struct repository *r, return g; } +/* + * returns 1 if and only if all graphs in the chain have + * corrected commit dates stored in the generation_data chunk. + */ +static int validate_mixed_generation_chain(struct commit_graph *g) +{ + int read_generation_data = 1; + struct commit_graph *p = g; + + while (read_generation_data && p) { + read_generation_data = p->read_generation_data; + p = p->base_graph; + } + + if (read_generation_data) + return 1; + + while (g) { + g->read_generation_data = 0; + g = g->base_graph; + } + + return 0; +} + static int add_graph_to_chain(struct commit_graph *g, struct commit_graph *chain, struct object_id *oids, @@ -581,6 +606,8 @@ struct commit_graph *load_commit_graph_chain_fd_st(struct repository *r, } } + validate_mixed_generation_chain(graph_chain); + free(oids); fclose(fp); strbuf_release(&line); @@ -605,31 +632,6 @@ static struct commit_graph *load_commit_graph_chain(struct repository *r, return g; } -/* - * returns 1 if and only if all graphs in the chain have - * corrected commit dates stored in the generation_data chunk. - */ -static int validate_mixed_generation_chain(struct commit_graph *g) -{ - int read_generation_data = 1; - struct commit_graph *p = g; - - while (read_generation_data && p) { - read_generation_data = p->read_generation_data; - p = p->base_graph; - } - - if (read_generation_data) - return 1; - - while (g) { - g->read_generation_data = 0; - g = g->base_graph; - } - - return 0; -} - struct commit_graph *read_commit_graph_one(struct repository *r, struct object_directory *odb) { @@ -638,8 +640,6 @@ struct commit_graph *read_commit_graph_one(struct repository *r, if (!g) g = load_commit_graph_chain(r, odb); - validate_mixed_generation_chain(g); - return g; } From patchwork Tue Sep 26 05:58:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13398683 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BB865E7D0C5 for ; Tue, 26 Sep 2023 05:58:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233682AbjIZF6r (ORCPT ); Tue, 26 Sep 2023 01:58:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52506 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232376AbjIZF6o (ORCPT ); Tue, 26 Sep 2023 01:58:44 -0400 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E88CFE9 for ; Mon, 25 Sep 2023 22:58:37 -0700 (PDT) Received: (qmail 14775 invoked by uid 109); 26 Sep 2023 05:58:37 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Tue, 26 Sep 2023 05:58:37 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 21441 invoked by uid 111); 26 Sep 2023 05:58:38 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Tue, 26 Sep 2023 01:58:38 -0400 Authentication-Results: peff.net; auth=none Date: Tue, 26 Sep 2023 01:58:36 -0400 From: Jeff King To: git@vger.kernel.org Cc: Taylor Blau , Derrick Stolee Subject: [PATCH 3/6] t5324: harmonize sha1/sha256 graph chain corruption Message-ID: <20230926055836.GC1341418@coredump.intra.peff.net> References: <20230926055452.GA1341109@coredump.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20230926055452.GA1341109@coredump.intra.peff.net> Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In t5324.20, we corrupt a hex character 60 bytes into the graph chain file. Since the file consists of two hash identifiers, one per line, the corruption differs between sha1 and sha256. In a sha1 repository, the corruption is on the second line, and in a sha256 repository, it is on the first. We should of course detect the problem with either line. But as the next few patches will show (and fix), that is not the case (in fact, we currently do not exit non-zero for either line!). And while at the end of our series we'll catch all errors, our intermediate states will have differing behavior between the two hashes. Let's make this test behave consistently with either hash by always corrupting the first line. We'll add additional tests that explicitly cover the second line as we fix those bugs. Signed-off-by: Jeff King --- t/t5324-split-commit-graph.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index 36c4141e67..e335ef87a6 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -316,11 +316,11 @@ test_expect_success 'verify after commit-graph-chain corruption' ' git clone --no-hardlinks . verify-chain && ( cd verify-chain && - corrupt_file "$graphdir/commit-graph-chain" 60 "G" && + corrupt_file "$graphdir/commit-graph-chain" 30 "G" && git commit-graph verify 2>test_err && grep -v "^+" test_err >err && test_i18ngrep "invalid commit-graph chain" err && - corrupt_file "$graphdir/commit-graph-chain" 60 "A" && + corrupt_file "$graphdir/commit-graph-chain" 30 "A" && git commit-graph verify 2>test_err && grep -v "^+" test_err >err && test_i18ngrep "unable to find all commit-graph files" err From patchwork Tue Sep 26 06:00:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13398701 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11B96E8181D for ; Tue, 26 Sep 2023 06:00:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229722AbjIZGAX (ORCPT ); Tue, 26 Sep 2023 02:00:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41998 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229461AbjIZGAW (ORCPT ); Tue, 26 Sep 2023 02:00:22 -0400 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1A2E8FF for ; Mon, 25 Sep 2023 23:00:15 -0700 (PDT) Received: (qmail 14785 invoked by uid 109); 26 Sep 2023 06:00:15 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Tue, 26 Sep 2023 06:00:15 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 21475 invoked by uid 111); 26 Sep 2023 06:00:16 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Tue, 26 Sep 2023 02:00:16 -0400 Authentication-Results: peff.net; auth=none Date: Tue, 26 Sep 2023 02:00:13 -0400 From: Jeff King To: git@vger.kernel.org Cc: Taylor Blau , Derrick Stolee Subject: [PATCH 4/6] commit-graph: detect read errors when verifying graph chain Message-ID: <20230926060013.GD1341418@coredump.intra.peff.net> References: <20230926055452.GA1341109@coredump.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20230926055452.GA1341109@coredump.intra.peff.net> Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Because it's OK to not have a graph file at all, the graph_verify() function needs to tell the difference between a missing file and a real error. So when loading a traditional graph file, we call open_commit_graph() separately from load_commit_graph_chain_fd_st(), and don't complain if the first one fails with ENOENT. When the function learned about chain files in 3da4b609bb (commit-graph: verify chains with --shallow mode, 2019-06-18), we couldn't be as careful, since the only way to load a chain was with read_commit_graph_one(), which did both the open/load as a single unit. So we'll miss errors in chain files we load, thinking instead that there was just no chain file at all. Note that we do still report some of these problems to stderr, as the loading function calls error() and warning(). But we'd exit with a successful exit code, which is wrong. We can fix that by using the recently split open/load functions for chains. That lets us treat the chain file just like a single file with respect to error handling here. An existing test (from 3da4b609bb) shows off the problem; we were expecting "commit-graph verify" to report success, but that makes no sense. We did not even verify the contents of the graph data, because we couldn't load it! I don't think this was an intentional exception, but rather just the test covering what happened to occur. Signed-off-by: Jeff King --- builtin/commit-graph.c | 23 ++++++++++++++++------- t/t5324-split-commit-graph.sh | 4 ++-- 2 files changed, 18 insertions(+), 9 deletions(-) diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index c88389df24..50c15d9bfe 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -69,7 +69,8 @@ static int graph_verify(int argc, const char **argv, const char *prefix) struct commit_graph *graph = NULL; struct object_directory *odb = NULL; char *graph_name; - int open_ok; + char *chain_name; + enum { OPENED_NONE, OPENED_GRAPH, OPENED_CHAIN } opened = OPENED_NONE; int fd; struct stat st; int flags = 0; @@ -102,21 +103,29 @@ static int graph_verify(int argc, const char **argv, const char *prefix) odb = find_odb(the_repository, opts.obj_dir); graph_name = get_commit_graph_filename(odb); - open_ok = open_commit_graph(graph_name, &fd, &st); - if (!open_ok && errno != ENOENT) + chain_name = get_commit_graph_chain_filename(odb); + if (open_commit_graph(graph_name, &fd, &st)) + opened = OPENED_GRAPH; + else if (errno != ENOENT) die_errno(_("Could not open commit-graph '%s'"), graph_name); + else if (open_commit_graph_chain(chain_name, &fd, &st)) + opened = OPENED_CHAIN; + else if (errno != ENOENT) + die_errno(_("could not open commit-graph chain '%s'"), chain_name); FREE_AND_NULL(graph_name); + FREE_AND_NULL(chain_name); FREE_AND_NULL(options); - if (open_ok) + if (opened == OPENED_NONE) + return 0; + else if (opened == OPENED_GRAPH) graph = load_commit_graph_one_fd_st(the_repository, fd, &st, odb); else - graph = read_commit_graph_one(the_repository, odb); + graph = load_commit_graph_chain_fd_st(the_repository, fd, &st); - /* Return failure if open_ok predicted success */ if (!graph) - return !!open_ok; + return 1; ret = verify_commit_graph(the_repository, graph, flags); free_commit_graph(graph); diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index e335ef87a6..0ac7bbd1dc 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -317,11 +317,11 @@ test_expect_success 'verify after commit-graph-chain corruption' ' ( cd verify-chain && corrupt_file "$graphdir/commit-graph-chain" 30 "G" && - git commit-graph verify 2>test_err && + test_must_fail git commit-graph verify 2>test_err && grep -v "^+" test_err >err && test_i18ngrep "invalid commit-graph chain" err && corrupt_file "$graphdir/commit-graph-chain" 30 "A" && - git commit-graph verify 2>test_err && + test_must_fail git commit-graph verify 2>test_err && grep -v "^+" test_err >err && test_i18ngrep "unable to find all commit-graph files" err ) From patchwork Tue Sep 26 06:03:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13398702 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A9A2E7D0C5 for ; Tue, 26 Sep 2023 06:03:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233702AbjIZGDJ (ORCPT ); Tue, 26 Sep 2023 02:03:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45416 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231466AbjIZGDI (ORCPT ); Tue, 26 Sep 2023 02:03:08 -0400 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 50008DE for ; Mon, 25 Sep 2023 23:03:02 -0700 (PDT) Received: (qmail 14795 invoked by uid 109); 26 Sep 2023 06:03:02 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Tue, 26 Sep 2023 06:03:02 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 21487 invoked by uid 111); 26 Sep 2023 06:03:03 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Tue, 26 Sep 2023 02:03:03 -0400 Authentication-Results: peff.net; auth=none Date: Tue, 26 Sep 2023 02:03:00 -0400 From: Jeff King To: git@vger.kernel.org Cc: Taylor Blau , Derrick Stolee Subject: [PATCH 5/6] commit-graph: tighten chain size check Message-ID: <20230926060300.GE1341418@coredump.intra.peff.net> References: <20230926055452.GA1341109@coredump.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20230926055452.GA1341109@coredump.intra.peff.net> Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org When we open a commit-graph-chain file, if it's smaller than a single entry, we just quietly treat that as ENOENT. That make some sense if the file is truly zero bytes, but it means that "commit-graph verify" will quietly ignore a file that contains garbage if that garbage happens to be short. Instead, let's only simulate ENOENT when the file is truly empty, and otherwise return EINVAL. The normal graph-loading routines don't care, but "commit-graph verify" will notice and complain about the difference. It's not entirely clear to me that the 0-is-ENOENT case actually happens in real life, so we could perhaps just eliminate this special-case altogether. But this is how we've always behaved, so I'm preserving it in the name of backwards compatibility (though again, it really only matters for "verify", as the regular routines are happy to load what they can). Signed-off-by: Jeff King --- There's a related issue which I didn't fix in this series, which is that the "load" function does a truncating division of (size / hash_size + 1) to find the number of entries (the +1 is for the newline). So a less-than-hash-sized chunk of garbage at the end of the file will be ignored. The simplest way to fix that is that we can insist that the size mod hash_size+1 is 0. But I was wary of being too picky here (e.g., what about a file that doesn't end with newline) especially in a way that affected more than just "verify". commit-graph.c | 10 ++++++++-- t/t5324-split-commit-graph.sh | 12 ++++++++++++ 2 files changed, 20 insertions(+), 2 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 8b29c6de24..b1d3e5bf94 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -548,9 +548,15 @@ int open_commit_graph_chain(const char *chain_file, close(*fd); return 0; } - if (st->st_size <= the_hash_algo->hexsz) { + if (st->st_size < the_hash_algo->hexsz) { close(*fd); - errno = ENOENT; + if (!st->st_size) { + /* treat empty files the same as missing */ + errno = ENOENT; + } else { + warning("commit-graph chain file too small"); + errno = EINVAL; + } return 0; } return 1; diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index 0ac7bbd1dc..86d56debf6 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -327,6 +327,18 @@ test_expect_success 'verify after commit-graph-chain corruption' ' ) ' +test_expect_success 'verify notices too-short chain file' ' + git clone --no-hardlinks . verify-chain-garbage && + ( + cd verify-chain-garbage && + git commit-graph verify && + echo "garbage" >$graphdir/commit-graph-chain && + test_must_fail git commit-graph verify 2>test_err && + grep -v "^+" test_err >err && + grep "commit-graph chain file too small" err + ) +' + test_expect_success 'verify across alternates' ' git clone --no-hardlinks . verify-alt && ( From patchwork Tue Sep 26 06:04:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13398708 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 178EDE7D0C5 for ; Tue, 26 Sep 2023 06:04:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232546AbjIZGEp (ORCPT ); Tue, 26 Sep 2023 02:04:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48680 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232542AbjIZGEi (ORCPT ); Tue, 26 Sep 2023 02:04:38 -0400 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8A03DD6 for ; Mon, 25 Sep 2023 23:04:31 -0700 (PDT) Received: (qmail 14809 invoked by uid 109); 26 Sep 2023 06:04:31 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Tue, 26 Sep 2023 06:04:31 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 21492 invoked by uid 111); 26 Sep 2023 06:04:32 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Tue, 26 Sep 2023 02:04:32 -0400 Authentication-Results: peff.net; auth=none Date: Tue, 26 Sep 2023 02:04:30 -0400 From: Jeff King To: git@vger.kernel.org Cc: Taylor Blau , Derrick Stolee Subject: [PATCH 6/6] commit-graph: report incomplete chains during verification Message-ID: <20230926060430.GF1341418@coredump.intra.peff.net> References: <20230926055452.GA1341109@coredump.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20230926055452.GA1341109@coredump.intra.peff.net> Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The load_commit_graph_chain_fd_st() function will stop loading chains when it sees an error. But if it has loaded any graph slice at all, it will return it. This is a good thing for normal use (we use what data we can, and this is just an optimization). But it's a bad thing for "commit-graph verify", which should be careful about finding any irregularities. We do complain to stderr with a warning(), but the verify command still exits with a successful return code. The new tests here cover corruption of both the base and tip slices of the chain. The corruption of the base file already works (it is the first file we look at, so when we see the error we return NULL). The "tip" case is what is fixed by this patch (it complains to stderr but still returns the base slice). Note that this also causes us to adjust a test later in the file that similarly corrupts a tip (though confusingly the test script calls this "base"). It checks stderr but erroneously expects the whole "verify" command to exit with a successful code. Signed-off-by: Jeff King --- builtin/commit-graph.c | 10 +++++++++- commit-graph.c | 9 +++++++-- commit-graph.h | 3 ++- t/t5324-split-commit-graph.sh | 28 +++++++++++++++++++++++++++- 4 files changed, 45 insertions(+), 5 deletions(-) diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 50c15d9bfe..f5e66e9969 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -74,6 +74,7 @@ static int graph_verify(int argc, const char **argv, const char *prefix) int fd; struct stat st; int flags = 0; + int incomplete_chain = 0; int ret; static struct option builtin_commit_graph_verify_options[] = { @@ -122,13 +123,20 @@ static int graph_verify(int argc, const char **argv, const char *prefix) else if (opened == OPENED_GRAPH) graph = load_commit_graph_one_fd_st(the_repository, fd, &st, odb); else - graph = load_commit_graph_chain_fd_st(the_repository, fd, &st); + graph = load_commit_graph_chain_fd_st(the_repository, fd, &st, + &incomplete_chain); if (!graph) return 1; ret = verify_commit_graph(the_repository, graph, flags); free_commit_graph(graph); + + if (incomplete_chain) { + error("one or more commit-graph chain files could not be loaded"); + ret |= 1; + } + return ret; } diff --git a/commit-graph.c b/commit-graph.c index b1d3e5bf94..148af50058 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -563,14 +563,17 @@ int open_commit_graph_chain(const char *chain_file, } struct commit_graph *load_commit_graph_chain_fd_st(struct repository *r, - int fd, struct stat *st) + int fd, struct stat *st, + int *incomplete_chain) { struct commit_graph *graph_chain = NULL; struct strbuf line = STRBUF_INIT; struct object_id *oids; int i = 0, valid = 1, count; FILE *fp = xfdopen(fd, "r"); + *incomplete_chain = 0; + count = st->st_size / (the_hash_algo->hexsz + 1); CALLOC_ARRAY(oids, count); @@ -608,6 +611,7 @@ struct commit_graph *load_commit_graph_chain_fd_st(struct repository *r, if (!valid) { warning(_("unable to find all commit-graph files")); + *incomplete_chain = 1; break; } } @@ -630,8 +634,9 @@ static struct commit_graph *load_commit_graph_chain(struct repository *r, struct commit_graph *g = NULL; if (open_commit_graph_chain(chain_file, &fd, &st)) { + int incomplete; /* ownership of fd is taken over by load function */ - g = load_commit_graph_chain_fd_st(r, fd, &st); + g = load_commit_graph_chain_fd_st(r, fd, &st, &incomplete); } free(chain_file); diff --git a/commit-graph.h b/commit-graph.h index 3b662fd2b5..20ada7e891 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -107,7 +107,8 @@ struct commit_graph *load_commit_graph_one_fd_st(struct repository *r, int fd, struct stat *st, struct object_directory *odb); struct commit_graph *load_commit_graph_chain_fd_st(struct repository *r, - int fd, struct stat *st); + int fd, struct stat *st, + int *incomplete_chain); struct commit_graph *read_commit_graph_one(struct repository *r, struct object_directory *odb); diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index 86d56debf6..598660557a 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -285,6 +285,32 @@ test_expect_success 'verify hashes along chain, even in shallow' ' ) ' +test_expect_success 'verify notices chain slice which is bogus (base)' ' + git clone --no-hardlinks . verify-chain-bogus-base && + ( + cd verify-chain-bogus-base && + git commit-graph verify && + base_file=$graphdir/graph-$(sed -n 1p $graphdir/commit-graph-chain).graph && + echo "garbage" >$base_file && + test_must_fail git commit-graph verify 2>test_err && + grep -v "^+" test_err >err && + grep "commit-graph file is too small" err + ) +' + +test_expect_success 'verify notices chain slice which is bogus (tip)' ' + git clone --no-hardlinks . verify-chain-bogus-tip && + ( + cd verify-chain-bogus-tip && + git commit-graph verify && + tip_file=$graphdir/graph-$(sed -n 2p $graphdir/commit-graph-chain).graph && + echo "garbage" >$tip_file && + test_must_fail git commit-graph verify 2>test_err && + grep -v "^+" test_err >err && + grep "commit-graph file is too small" err + ) +' + test_expect_success 'verify --shallow does not check base contents' ' git clone --no-hardlinks . verify-shallow && ( @@ -306,7 +332,7 @@ test_expect_success 'warn on base graph chunk incorrect' ' git commit-graph verify && base_file=$graphdir/graph-$(tail -n 1 $graphdir/commit-graph-chain).graph && corrupt_file "$base_file" $(test_oid base) "\01" && - git commit-graph verify --shallow 2>test_err && + test_must_fail git commit-graph verify --shallow 2>test_err && grep -v "^+" test_err >err && test_i18ngrep "commit-graph chain does not match" err )