From patchwork Thu Sep 28 04:38:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13401948 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E975CE7AFE for ; Thu, 28 Sep 2023 04:38:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229920AbjI1EiL (ORCPT ); Thu, 28 Sep 2023 00:38:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41484 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229453AbjI1EiK (ORCPT ); Thu, 28 Sep 2023 00:38:10 -0400 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D032F12A for ; Wed, 27 Sep 2023 21:38:08 -0700 (PDT) Received: (qmail 3669 invoked by uid 109); 28 Sep 2023 04:38:08 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 28 Sep 2023 04:38:08 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 9647 invoked by uid 111); 28 Sep 2023 04:38:09 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 28 Sep 2023 00:38:09 -0400 Authentication-Results: peff.net; auth=none Date: Thu, 28 Sep 2023 00:38:07 -0400 From: Jeff King To: git@vger.kernel.org Cc: Taylor Blau , Derrick Stolee Subject: [PATCH v2 1/6] commit-graph: factor out chain opening function Message-ID: <20230928043807.GA58367@coredump.intra.peff.net> References: <20230928043746.GB57926@coredump.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20230928043746.GB57926@coredump.intra.peff.net> Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The load_commit_graph_chain() function opens the chain file and all of the slices of graph that it points to. If there is no chain file (which is a totally normal condition), we return NULL. But if we run into errors with the chain file or loading the actual graph data, we also return NULL, and the caller cannot tell the difference. The caller can check for ENOENT for the unremarkable "no such file" case. But I'm hesitant to assume that the rest of the function would never accidentally set errno to ENOENT itself, since it is opening the slice files (and that would mean the caller fails to notice a real error). So let's break this into two functions: one to open the file, and one to actually load it. This matches the interface we provide for the non-chain graph file, which will also come in handy in a moment when we fix some bugs in the "git commit-graph verify" code. Some notes: - I've kept the "1 is good, 0 is bad" return convention (and the weird "fd" out-parameter) used by the matching open_commit_graph() function and other parts of the commit-graph code. This is unlike most of the rest of Git (which would just return the fd, with -1 for error), but it makes sense to stay consistent with the adjacent bits of the API here. - The existing chain loading function will quietly return if the file is too small to hold a single entry. I've retained that behavior (and explicitly set ENOENT in the opener function) for now, under the notion that it's probably valid (though I'd imagine unusual) to have an empty chain file. There are two small behavior changes here, but I think both are strictly positive: 1. The original blindly did a stat() before checking if fopen() succeeded, meaning we were making a pointless extra stat call. 2. We now use fstat() to check the file size. The previous code using a regular stat() on the pathname meant we could technically race with somebody updating the chain file, and end up with a size that does not match what we just opened with fopen(). I doubt anybody ever hit this in practice, but it may have caused an out-of-bounds read. We'll retain the load_commit_graph_chain() function which does both the open and reading steps (most existing callers do not care about seeing errors anyway, since loading commit-graphs is optimistic). Signed-off-by: Jeff King --- Same as v1. commit-graph.c | 58 +++++++++++++++++++++++++++++++++----------------- commit-graph.h | 3 +++ 2 files changed, 42 insertions(+), 19 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 5e8a3a5085..12cdd9af8e 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -513,31 +513,34 @@ static int add_graph_to_chain(struct commit_graph *g, return 1; } -static struct commit_graph *load_commit_graph_chain(struct repository *r, - struct object_directory *odb) +int open_commit_graph_chain(const char *chain_file, + int *fd, struct stat *st) +{ + *fd = git_open(chain_file); + if (*fd < 0) + return 0; + if (fstat(*fd, st)) { + close(*fd); + return 0; + } + if (st->st_size <= the_hash_algo->hexsz) { + close(*fd); + errno = ENOENT; + return 0; + } + return 1; +} + +struct commit_graph *load_commit_graph_chain_fd_st(struct repository *r, + int fd, struct stat *st) { struct commit_graph *graph_chain = NULL; struct strbuf line = STRBUF_INIT; - struct stat st; struct object_id *oids; int i = 0, valid = 1, count; - char *chain_name = get_commit_graph_chain_filename(odb); - FILE *fp; - int stat_res; + FILE *fp = xfdopen(fd, "r"); - fp = fopen(chain_name, "r"); - stat_res = stat(chain_name, &st); - free(chain_name); - - if (!fp) - return NULL; - if (stat_res || - st.st_size <= the_hash_algo->hexsz) { - fclose(fp); - return NULL; - } - - count = st.st_size / (the_hash_algo->hexsz + 1); + count = st->st_size / (the_hash_algo->hexsz + 1); CALLOC_ARRAY(oids, count); prepare_alt_odb(r); @@ -585,6 +588,23 @@ static struct commit_graph *load_commit_graph_chain(struct repository *r, return graph_chain; } +static struct commit_graph *load_commit_graph_chain(struct repository *r, + struct object_directory *odb) +{ + char *chain_file = get_commit_graph_chain_filename(odb); + struct stat st; + int fd; + struct commit_graph *g = NULL; + + if (open_commit_graph_chain(chain_file, &fd, &st)) { + /* ownership of fd is taken over by load function */ + g = load_commit_graph_chain_fd_st(r, fd, &st); + } + + free(chain_file); + return g; +} + /* * returns 1 if and only if all graphs in the chain have * corrected commit dates stored in the generation_data chunk. diff --git a/commit-graph.h b/commit-graph.h index 5e534f0fcc..3b662fd2b5 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -26,6 +26,7 @@ struct string_list; char *get_commit_graph_filename(struct object_directory *odb); char *get_commit_graph_chain_filename(struct object_directory *odb); int open_commit_graph(const char *graph_file, int *fd, struct stat *st); +int open_commit_graph_chain(const char *chain_file, int *fd, struct stat *st); /* * Given a commit struct, try to fill the commit struct info, including: @@ -105,6 +106,8 @@ struct commit_graph { struct commit_graph *load_commit_graph_one_fd_st(struct repository *r, int fd, struct stat *st, struct object_directory *odb); +struct commit_graph *load_commit_graph_chain_fd_st(struct repository *r, + int fd, struct stat *st); struct commit_graph *read_commit_graph_one(struct repository *r, struct object_directory *odb); From patchwork Thu Sep 28 04:38:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13401949 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6562BCE7AFE for ; Thu, 28 Sep 2023 04:38:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230115AbjI1EiW (ORCPT ); Thu, 28 Sep 2023 00:38:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47818 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230089AbjI1EiV (ORCPT ); Thu, 28 Sep 2023 00:38:21 -0400 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4A60F180 for ; Wed, 27 Sep 2023 21:38:19 -0700 (PDT) Received: (qmail 3679 invoked by uid 109); 28 Sep 2023 04:38:19 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 28 Sep 2023 04:38:19 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 9656 invoked by uid 111); 28 Sep 2023 04:38:20 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 28 Sep 2023 00:38:20 -0400 Authentication-Results: peff.net; auth=none Date: Thu, 28 Sep 2023 00:38:17 -0400 From: Jeff King To: git@vger.kernel.org Cc: Taylor Blau , Derrick Stolee Subject: [PATCH v2 2/6] commit-graph: check mixed generation validation when loading chain file Message-ID: <20230928043817.GB58367@coredump.intra.peff.net> References: <20230928043746.GB57926@coredump.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20230928043746.GB57926@coredump.intra.peff.net> Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In read_commit_graph_one(), we call validate_mixed_generation_chain() after loading the graph. Even though we don't check the return value, this has the side effect of clearing the read_generation_data flag, which is important when working with mixed generation numbers. But doing this in load_commit_graph_chain_fd_st() makes more sense: 1. We are calling it even when we did not load a chain at all, which is pointless (you cannot have mixed generations in a single file). 2. For now, all callers load the graph via read_commit_graph_one(). But the point of factoring out the open/load in the previous commit was to let "commit-graph verify" call them separately. So it needs to trigger this function as part of the load. Without this patch, the mixed-generation tests in t5324 would start failing on "git commit-graph verify" calls, once we switch to using a separate open/load call there. Signed-off-by: Jeff King --- Same as v1. commit-graph.c | 54 +++++++++++++++++++++++++------------------------- 1 file changed, 27 insertions(+), 27 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 12cdd9af8e..8b29c6de24 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -473,6 +473,31 @@ static struct commit_graph *load_commit_graph_v1(struct repository *r, return g; } +/* + * returns 1 if and only if all graphs in the chain have + * corrected commit dates stored in the generation_data chunk. + */ +static int validate_mixed_generation_chain(struct commit_graph *g) +{ + int read_generation_data = 1; + struct commit_graph *p = g; + + while (read_generation_data && p) { + read_generation_data = p->read_generation_data; + p = p->base_graph; + } + + if (read_generation_data) + return 1; + + while (g) { + g->read_generation_data = 0; + g = g->base_graph; + } + + return 0; +} + static int add_graph_to_chain(struct commit_graph *g, struct commit_graph *chain, struct object_id *oids, @@ -581,6 +606,8 @@ struct commit_graph *load_commit_graph_chain_fd_st(struct repository *r, } } + validate_mixed_generation_chain(graph_chain); + free(oids); fclose(fp); strbuf_release(&line); @@ -605,31 +632,6 @@ static struct commit_graph *load_commit_graph_chain(struct repository *r, return g; } -/* - * returns 1 if and only if all graphs in the chain have - * corrected commit dates stored in the generation_data chunk. - */ -static int validate_mixed_generation_chain(struct commit_graph *g) -{ - int read_generation_data = 1; - struct commit_graph *p = g; - - while (read_generation_data && p) { - read_generation_data = p->read_generation_data; - p = p->base_graph; - } - - if (read_generation_data) - return 1; - - while (g) { - g->read_generation_data = 0; - g = g->base_graph; - } - - return 0; -} - struct commit_graph *read_commit_graph_one(struct repository *r, struct object_directory *odb) { @@ -638,8 +640,6 @@ struct commit_graph *read_commit_graph_one(struct repository *r, if (!g) g = load_commit_graph_chain(r, odb); - validate_mixed_generation_chain(g); - return g; } From patchwork Thu Sep 28 04:38:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13401950 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D1266CE7AFB for ; Thu, 28 Sep 2023 04:38:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230127AbjI1Eiw (ORCPT ); Thu, 28 Sep 2023 00:38:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46688 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229472AbjI1Eiv (ORCPT ); Thu, 28 Sep 2023 00:38:51 -0400 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 77C59121 for ; Wed, 27 Sep 2023 21:38:48 -0700 (PDT) Received: (qmail 3690 invoked by uid 109); 28 Sep 2023 04:38:48 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 28 Sep 2023 04:38:48 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 9663 invoked by uid 111); 28 Sep 2023 04:38:49 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 28 Sep 2023 00:38:49 -0400 Authentication-Results: peff.net; auth=none Date: Thu, 28 Sep 2023 00:38:47 -0400 From: Jeff King To: git@vger.kernel.org Cc: Taylor Blau , Derrick Stolee Subject: [PATCH v2 3/6] t5324: harmonize sha1/sha256 graph chain corruption Message-ID: <20230928043847.GC58367@coredump.intra.peff.net> References: <20230928043746.GB57926@coredump.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20230928043746.GB57926@coredump.intra.peff.net> Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In t5324.20, we corrupt a hex character 60 bytes into the graph chain file. Since the file consists of two hash identifiers, one per line, the corruption differs between sha1 and sha256. In a sha1 repository, the corruption is on the second line, and in a sha256 repository, it is on the first. We should of course detect the problem with either line. But as the next few patches will show (and fix), that is not the case (in fact, we currently do not exit non-zero for either line!). And while at the end of our series we'll catch all errors, our intermediate states will have differing behavior between the two hashes. Let's make sure we test corruption of both the first and second lines, and do so consistently with either hash by choosing offsets which are always in the first hash (30 bytes) or in the second (70). Signed-off-by: Jeff King --- In v2 we are now testing both first and second lines, instead of just the first. t/t5324-split-commit-graph.sh | 25 ++++++++++++++++++++----- 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index 36c4141e67..a9b2428b56 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -312,15 +312,30 @@ test_expect_success 'warn on base graph chunk incorrect' ' ) ' -test_expect_success 'verify after commit-graph-chain corruption' ' - git clone --no-hardlinks . verify-chain && +test_expect_success 'verify after commit-graph-chain corruption (base)' ' + git clone --no-hardlinks . verify-chain-base && ( - cd verify-chain && - corrupt_file "$graphdir/commit-graph-chain" 60 "G" && + cd verify-chain-base && + corrupt_file "$graphdir/commit-graph-chain" 30 "G" && git commit-graph verify 2>test_err && grep -v "^+" test_err >err && test_i18ngrep "invalid commit-graph chain" err && - corrupt_file "$graphdir/commit-graph-chain" 60 "A" && + corrupt_file "$graphdir/commit-graph-chain" 30 "A" && + git commit-graph verify 2>test_err && + grep -v "^+" test_err >err && + test_i18ngrep "unable to find all commit-graph files" err + ) +' + +test_expect_success 'verify after commit-graph-chain corruption (tip)' ' + git clone --no-hardlinks . verify-chain-tip && + ( + cd verify-chain-tip && + corrupt_file "$graphdir/commit-graph-chain" 70 "G" && + git commit-graph verify 2>test_err && + grep -v "^+" test_err >err && + test_i18ngrep "invalid commit-graph chain" err && + corrupt_file "$graphdir/commit-graph-chain" 70 "A" && git commit-graph verify 2>test_err && grep -v "^+" test_err >err && test_i18ngrep "unable to find all commit-graph files" err From patchwork Thu Sep 28 04:38:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13401951 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0319CE7AFE for ; Thu, 28 Sep 2023 04:39:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230160AbjI1EjE (ORCPT ); Thu, 28 Sep 2023 00:39:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48596 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229460AbjI1EjD (ORCPT ); Thu, 28 Sep 2023 00:39:03 -0400 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 891AA122 for ; Wed, 27 Sep 2023 21:39:01 -0700 (PDT) Received: (qmail 3702 invoked by uid 109); 28 Sep 2023 04:39:01 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 28 Sep 2023 04:39:01 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 9668 invoked by uid 111); 28 Sep 2023 04:39:02 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 28 Sep 2023 00:39:02 -0400 Authentication-Results: peff.net; auth=none Date: Thu, 28 Sep 2023 00:38:59 -0400 From: Jeff King To: git@vger.kernel.org Cc: Taylor Blau , Derrick Stolee Subject: [PATCH v2 4/6] commit-graph: detect read errors when verifying graph chain Message-ID: <20230928043859.GD58367@coredump.intra.peff.net> References: <20230928043746.GB57926@coredump.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20230928043746.GB57926@coredump.intra.peff.net> Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Because it's OK to not have a graph file at all, the graph_verify() function needs to tell the difference between a missing file and a real error. So when loading a traditional graph file, we call open_commit_graph() separately from load_commit_graph_chain_fd_st(), and don't complain if the first one fails with ENOENT. When the function learned about chain files in 3da4b609bb (commit-graph: verify chains with --shallow mode, 2019-06-18), we couldn't be as careful, since the only way to load a chain was with read_commit_graph_one(), which did both the open/load as a single unit. So we'll miss errors in chain files we load, thinking instead that there was just no chain file at all. Note that we do still report some of these problems to stderr, as the loading function calls error() and warning(). But we'd exit with a successful exit code, which is wrong. We can fix that by using the recently split open/load functions for chains. That lets us treat the chain file just like a single file with respect to error handling here. An existing test (from 3da4b609bb) shows off the problem; we were expecting "commit-graph verify" to report success, but that makes no sense. We did not even verify the contents of the graph data, because we couldn't load it! I don't think this was an intentional exception, but rather just the test covering what happened to occur. Signed-off-by: Jeff King --- builtin/commit-graph.c | 23 ++++++++++++++++------- t/t5324-split-commit-graph.sh | 4 ++-- 2 files changed, 18 insertions(+), 9 deletions(-) diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index c88389df24..50c15d9bfe 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -69,7 +69,8 @@ static int graph_verify(int argc, const char **argv, const char *prefix) struct commit_graph *graph = NULL; struct object_directory *odb = NULL; char *graph_name; - int open_ok; + char *chain_name; + enum { OPENED_NONE, OPENED_GRAPH, OPENED_CHAIN } opened = OPENED_NONE; int fd; struct stat st; int flags = 0; @@ -102,21 +103,29 @@ static int graph_verify(int argc, const char **argv, const char *prefix) odb = find_odb(the_repository, opts.obj_dir); graph_name = get_commit_graph_filename(odb); - open_ok = open_commit_graph(graph_name, &fd, &st); - if (!open_ok && errno != ENOENT) + chain_name = get_commit_graph_chain_filename(odb); + if (open_commit_graph(graph_name, &fd, &st)) + opened = OPENED_GRAPH; + else if (errno != ENOENT) die_errno(_("Could not open commit-graph '%s'"), graph_name); + else if (open_commit_graph_chain(chain_name, &fd, &st)) + opened = OPENED_CHAIN; + else if (errno != ENOENT) + die_errno(_("could not open commit-graph chain '%s'"), chain_name); FREE_AND_NULL(graph_name); + FREE_AND_NULL(chain_name); FREE_AND_NULL(options); - if (open_ok) + if (opened == OPENED_NONE) + return 0; + else if (opened == OPENED_GRAPH) graph = load_commit_graph_one_fd_st(the_repository, fd, &st, odb); else - graph = read_commit_graph_one(the_repository, odb); + graph = load_commit_graph_chain_fd_st(the_repository, fd, &st); - /* Return failure if open_ok predicted success */ if (!graph) - return !!open_ok; + return 1; ret = verify_commit_graph(the_repository, graph, flags); free_commit_graph(graph); diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index a9b2428b56..a5ac0440f1 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -317,11 +317,11 @@ test_expect_success 'verify after commit-graph-chain corruption (base)' ' ( cd verify-chain-base && corrupt_file "$graphdir/commit-graph-chain" 30 "G" && - git commit-graph verify 2>test_err && + test_must_fail git commit-graph verify 2>test_err && grep -v "^+" test_err >err && test_i18ngrep "invalid commit-graph chain" err && corrupt_file "$graphdir/commit-graph-chain" 30 "A" && - git commit-graph verify 2>test_err && + test_must_fail git commit-graph verify 2>test_err && grep -v "^+" test_err >err && test_i18ngrep "unable to find all commit-graph files" err ) From patchwork Thu Sep 28 04:39:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13401952 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE7E9CE7AFB for ; Thu, 28 Sep 2023 04:39:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230163AbjI1EjO (ORCPT ); Thu, 28 Sep 2023 00:39:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36444 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229472AbjI1EjN (ORCPT ); Thu, 28 Sep 2023 00:39:13 -0400 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DB2DF192 for ; Wed, 27 Sep 2023 21:39:11 -0700 (PDT) Received: (qmail 3715 invoked by uid 109); 28 Sep 2023 04:39:11 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 28 Sep 2023 04:39:11 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 9673 invoked by uid 111); 28 Sep 2023 04:39:13 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 28 Sep 2023 00:39:13 -0400 Authentication-Results: peff.net; auth=none Date: Thu, 28 Sep 2023 00:39:10 -0400 From: Jeff King To: git@vger.kernel.org Cc: Taylor Blau , Derrick Stolee Subject: [PATCH v2 5/6] commit-graph: tighten chain size check Message-ID: <20230928043910.GE58367@coredump.intra.peff.net> References: <20230928043746.GB57926@coredump.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20230928043746.GB57926@coredump.intra.peff.net> Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org When we open a commit-graph-chain file, if it's smaller than a single entry, we just quietly treat that as ENOENT. That make some sense if the file is truly zero bytes, but it means that "commit-graph verify" will quietly ignore a file that contains garbage if that garbage happens to be short. Instead, let's only simulate ENOENT when the file is truly empty, and otherwise return EINVAL. The normal graph-loading routines don't care, but "commit-graph verify" will notice and complain about the difference. It's not entirely clear to me that the 0-is-ENOENT case actually happens in real life, so we could perhaps just eliminate this special-case altogether. But this is how we've always behaved, so I'm preserving it in the name of backwards compatibility (though again, it really only matters for "verify", as the regular routines are happy to load what they can). Signed-off-by: Jeff King --- commit-graph.c | 10 ++++++++-- t/t5324-split-commit-graph.sh | 12 ++++++++++++ 2 files changed, 20 insertions(+), 2 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 8b29c6de24..b1d3e5bf94 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -548,9 +548,15 @@ int open_commit_graph_chain(const char *chain_file, close(*fd); return 0; } - if (st->st_size <= the_hash_algo->hexsz) { + if (st->st_size < the_hash_algo->hexsz) { close(*fd); - errno = ENOENT; + if (!st->st_size) { + /* treat empty files the same as missing */ + errno = ENOENT; + } else { + warning("commit-graph chain file too small"); + errno = EINVAL; + } return 0; } return 1; diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index a5ac0440f1..be58545810 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -342,6 +342,18 @@ test_expect_success 'verify after commit-graph-chain corruption (tip)' ' ) ' +test_expect_success 'verify notices too-short chain file' ' + git clone --no-hardlinks . verify-chain-short && + ( + cd verify-chain-short && + git commit-graph verify && + echo "garbage" >$graphdir/commit-graph-chain && + test_must_fail git commit-graph verify 2>test_err && + grep -v "^+" test_err >err && + grep "commit-graph chain file too small" err + ) +' + test_expect_success 'verify across alternates' ' git clone --no-hardlinks . verify-alt && ( From patchwork Thu Sep 28 04:39:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13401953 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7CE06CE7AFE for ; Thu, 28 Sep 2023 04:39:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230169AbjI1Ejz (ORCPT ); Thu, 28 Sep 2023 00:39:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38684 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229453AbjI1Ejy (ORCPT ); Thu, 28 Sep 2023 00:39:54 -0400 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 147F1121 for ; Wed, 27 Sep 2023 21:39:52 -0700 (PDT) Received: (qmail 3733 invoked by uid 109); 28 Sep 2023 04:39:52 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 28 Sep 2023 04:39:52 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 9678 invoked by uid 111); 28 Sep 2023 04:39:54 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 28 Sep 2023 00:39:54 -0400 Authentication-Results: peff.net; auth=none Date: Thu, 28 Sep 2023 00:39:51 -0400 From: Jeff King To: git@vger.kernel.org Cc: Taylor Blau , Derrick Stolee Subject: [PATCH v2 6/6] commit-graph: report incomplete chains during verification Message-ID: <20230928043951.GF58367@coredump.intra.peff.net> References: <20230928043746.GB57926@coredump.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20230928043746.GB57926@coredump.intra.peff.net> Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The load_commit_graph_chain_fd_st() function will stop loading chains when it sees an error. But if it has loaded any graph slice at all, it will return it. This is a good thing for normal use (we use what data we can, and this is just an optimization). But it's a bad thing for "commit-graph verify", which should be careful about finding any irregularities. We do complain to stderr with a warning(), but the verify command still exits with a successful return code. The new tests here cover corruption of both the base and tip slices of the chain. The corruption of the base file already works (it is the first file we look at, so when we see the error we return NULL). The "tip" case is what is fixed by this patch (it complains to stderr but still returns the base slice). Likewise the existing tests for corruption of the commit-graph-chain file itself need to be updated. We already exited non-zero correctly for the "base" case, but the "tip" case can now do so, too. Note that this also causes us to adjust a test later in the file that similarly corrupts a tip (though confusingly the test script calls this "base"). It checks stderr but erroneously expects the whole "verify" command to exit with a successful code. Signed-off-by: Jeff King --- v2 now covers failed get_oid_hex() call on non-base lines. builtin/commit-graph.c | 10 +++++++++- commit-graph.c | 7 +++++-- commit-graph.h | 3 ++- t/t5324-split-commit-graph.sh | 32 +++++++++++++++++++++++++++++--- 4 files changed, 45 insertions(+), 7 deletions(-) diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 50c15d9bfe..f5e66e9969 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -74,6 +74,7 @@ static int graph_verify(int argc, const char **argv, const char *prefix) int fd; struct stat st; int flags = 0; + int incomplete_chain = 0; int ret; static struct option builtin_commit_graph_verify_options[] = { @@ -122,13 +123,20 @@ static int graph_verify(int argc, const char **argv, const char *prefix) else if (opened == OPENED_GRAPH) graph = load_commit_graph_one_fd_st(the_repository, fd, &st, odb); else - graph = load_commit_graph_chain_fd_st(the_repository, fd, &st); + graph = load_commit_graph_chain_fd_st(the_repository, fd, &st, + &incomplete_chain); if (!graph) return 1; ret = verify_commit_graph(the_repository, graph, flags); free_commit_graph(graph); + + if (incomplete_chain) { + error("one or more commit-graph chain files could not be loaded"); + ret |= 1; + } + return ret; } diff --git a/commit-graph.c b/commit-graph.c index b1d3e5bf94..1a56efcf69 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -563,7 +563,8 @@ int open_commit_graph_chain(const char *chain_file, } struct commit_graph *load_commit_graph_chain_fd_st(struct repository *r, - int fd, struct stat *st) + int fd, struct stat *st, + int *incomplete_chain) { struct commit_graph *graph_chain = NULL; struct strbuf line = STRBUF_INIT; @@ -618,6 +619,7 @@ struct commit_graph *load_commit_graph_chain_fd_st(struct repository *r, fclose(fp); strbuf_release(&line); + *incomplete_chain = !valid; return graph_chain; } @@ -630,8 +632,9 @@ static struct commit_graph *load_commit_graph_chain(struct repository *r, struct commit_graph *g = NULL; if (open_commit_graph_chain(chain_file, &fd, &st)) { + int incomplete; /* ownership of fd is taken over by load function */ - g = load_commit_graph_chain_fd_st(r, fd, &st); + g = load_commit_graph_chain_fd_st(r, fd, &st, &incomplete); } free(chain_file); diff --git a/commit-graph.h b/commit-graph.h index 3b662fd2b5..20ada7e891 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -107,7 +107,8 @@ struct commit_graph *load_commit_graph_one_fd_st(struct repository *r, int fd, struct stat *st, struct object_directory *odb); struct commit_graph *load_commit_graph_chain_fd_st(struct repository *r, - int fd, struct stat *st); + int fd, struct stat *st, + int *incomplete_chain); struct commit_graph *read_commit_graph_one(struct repository *r, struct object_directory *odb); diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index be58545810..06bb897f02 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -285,6 +285,32 @@ test_expect_success 'verify hashes along chain, even in shallow' ' ) ' +test_expect_success 'verify notices chain slice which is bogus (base)' ' + git clone --no-hardlinks . verify-chain-bogus-base && + ( + cd verify-chain-bogus-base && + git commit-graph verify && + base_file=$graphdir/graph-$(sed -n 1p $graphdir/commit-graph-chain).graph && + echo "garbage" >$base_file && + test_must_fail git commit-graph verify 2>test_err && + grep -v "^+" test_err >err && + grep "commit-graph file is too small" err + ) +' + +test_expect_success 'verify notices chain slice which is bogus (tip)' ' + git clone --no-hardlinks . verify-chain-bogus-tip && + ( + cd verify-chain-bogus-tip && + git commit-graph verify && + tip_file=$graphdir/graph-$(sed -n 2p $graphdir/commit-graph-chain).graph && + echo "garbage" >$tip_file && + test_must_fail git commit-graph verify 2>test_err && + grep -v "^+" test_err >err && + grep "commit-graph file is too small" err + ) +' + test_expect_success 'verify --shallow does not check base contents' ' git clone --no-hardlinks . verify-shallow && ( @@ -306,7 +332,7 @@ test_expect_success 'warn on base graph chunk incorrect' ' git commit-graph verify && base_file=$graphdir/graph-$(tail -n 1 $graphdir/commit-graph-chain).graph && corrupt_file "$base_file" $(test_oid base) "\01" && - git commit-graph verify --shallow 2>test_err && + test_must_fail git commit-graph verify --shallow 2>test_err && grep -v "^+" test_err >err && test_i18ngrep "commit-graph chain does not match" err ) @@ -332,11 +358,11 @@ test_expect_success 'verify after commit-graph-chain corruption (tip)' ' ( cd verify-chain-tip && corrupt_file "$graphdir/commit-graph-chain" 70 "G" && - git commit-graph verify 2>test_err && + test_must_fail git commit-graph verify 2>test_err && grep -v "^+" test_err >err && test_i18ngrep "invalid commit-graph chain" err && corrupt_file "$graphdir/commit-graph-chain" 70 "A" && - git commit-graph verify 2>test_err && + test_must_fail git commit-graph verify 2>test_err && grep -v "^+" test_err >err && test_i18ngrep "unable to find all commit-graph files" err )