diff mbox series

[04/20] commit-graph: check size of oid fanout chunk

Message ID 20231009205951.GD3282181@coredump.intra.peff.net (mailing list archive)
State Accepted
Commit 52e2e8d43dbae8c05b68efd60cde2aacf3a23890
Headers show
Series bounds-checks for chunk-based files | expand

Commit Message

Jeff King Oct. 9, 2023, 8:59 p.m. UTC
We load the oid fanout chunk with pair_chunk(), which means we never see
the size of the chunk. We just assume the on-disk file uses the
appropriate size, and if it's too small we'll access random memory.

It's easy to check this up-front; the fanout always consists of 256
uint32's, since it is a fanout of the first byte of the hash pointing
into the oid index. These parameters can't be changed without
introducing a new chunk type.

This matches the similar check in the midx OIDF chunk (but note that
rather than checking for the error immediately, the graph code just
leaves parts of the struct NULL and checks for required fields later).

Signed-off-by: Jeff King <peff@peff.net>
---
 commit-graph.c          | 13 +++++++++++--
 t/t5318-commit-graph.sh | 26 ++++++++++++++++++++++++++
 2 files changed, 37 insertions(+), 2 deletions(-)

Comments

Taylor Blau Oct. 11, 2023, 12:08 a.m. UTC | #1
On Mon, Oct 09, 2023 at 04:59:51PM -0400, Jeff King wrote:
> We load the oid fanout chunk with pair_chunk(), which means we never see
> the size of the chunk. We just assume the on-disk file uses the
> appropriate size, and if it's too small we'll access random memory.
>
> It's easy to check this up-front; the fanout always consists of 256
> uint32's, since it is a fanout of the first byte of the hash pointing
> into the oid index. These parameters can't be changed without
> introducing a new chunk type.

Cool, this is the first patch that should start reducing our usage of
the new pair_chunk_unsafe() and hardening these reads. Let's take a
look...

> This matches the similar check in the midx OIDF chunk (but note that
> rather than checking for the error immediately, the graph code just
> leaves parts of the struct NULL and checks for required fields later).
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  commit-graph.c          | 13 +++++++++++--
>  t/t5318-commit-graph.sh | 26 ++++++++++++++++++++++++++
>  2 files changed, 37 insertions(+), 2 deletions(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index a689a55b79..9b3b01da61 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -305,6 +305,16 @@ static int verify_commit_graph_lite(struct commit_graph *g)
>  	return 0;
>  }
>
> +static int graph_read_oid_fanout(const unsigned char *chunk_start,
> +				 size_t chunk_size, void *data)
> +{
> +	struct commit_graph *g = data;
> +	if (chunk_size != 256 * sizeof(uint32_t))
> +		return error("commit-graph oid fanout chunk is wrong size");

Should we mark this string for translation?

> +	g->chunk_oid_fanout = (const uint32_t *)chunk_start;
> +	return 0;
> +}
> +

Nice. This makes sense and seems like an obvious improvement over the
existing code.

I wonder how common this pattern is. We have read_chunk() which is for
handling more complex scenarios than this. But the safe version of
pair_chunk() really just wants to check that the size of the chunk is as
expected and assign the location in the mmap to some pointer.

Do you think it would be worth changing pair_chunk() to take an expected
size_t and handle this generically? I.e. have a version of
chunk-format::pair_chunk_fn() that looks something like:

    static int pair_chunk_fn(const unsigned char *chunk_start,
                             size_t chunk_size, void *data)
    {
        const unsigned char **p = data;
        if (chunk_size != data->size)
            return -1;
        *p = chunk_start;
        return 0;
    }

and then our call here would be:

  if (pair_chunk(cf, GRAPH_CHUNKID_OIDFANOUT,
                 (const unsigned char **)&graph->chunk_oid_fanout,
                 256 * sizeof(uint32_t)) < 0)
      return error("commit-graph oid fanout chunk is wrong size");

I dunno. It's hard to have a more concrete recomendation without having
read the rest of the series. So it's possible that this is just complete
nonsense ;-). But my hunch is that there are a number of callers that
would benefit from having this built in.

Thanks,
Taylor
Taylor Blau Oct. 11, 2023, 1:24 a.m. UTC | #2
On Tue, Oct 10, 2023 at 08:08:05PM -0400, Taylor Blau wrote:
> Do you think it would be worth changing pair_chunk() to take an expected
> size_t and handle this generically? I.e. have a version of
> chunk-format::pair_chunk_fn() that looks something like:
>
>     static int pair_chunk_fn(const unsigned char *chunk_start,
>                              size_t chunk_size, void *data)
>     {
>         const unsigned char **p = data;
>         if (chunk_size != data->size)
>             return -1;
>         *p = chunk_start;
>         return 0;
>     }
>
> and then our call here would be:
>
>   if (pair_chunk(cf, GRAPH_CHUNKID_OIDFANOUT,
>                  (const unsigned char **)&graph->chunk_oid_fanout,
>                  256 * sizeof(uint32_t)) < 0)
>       return error("commit-graph oid fanout chunk is wrong size");

I took a brief stab at implementing this tonight and came up with this,
which applies on top of this patch. Looking through the rest of the
series briefly[^1], I think having something like this would be useful:

--- 8< ---
diff --git a/chunk-format.c b/chunk-format.c
index 8d910e23f6..38b0f847be 100644
--- a/chunk-format.c
+++ b/chunk-format.c
@@ -157,6 +157,8 @@ int read_table_of_contents(struct chunkfile *cf,
 struct pair_chunk_data {
 	const unsigned char **p;
 	size_t *size;
+
+	size_t expected_size;
 };

 static int pair_chunk_fn(const unsigned char *chunk_start,
@@ -169,6 +171,20 @@ static int pair_chunk_fn(const unsigned char *chunk_start,
 	return 0;
 }

+static int pair_chunk_expect_fn(const unsigned char *chunk_start,
+				size_t chunk_size,
+				void *data)
+{
+	struct pair_chunk_data *pcd = data;
+	if (pcd->expected_size != chunk_size)
+		return error(_("mismatched chunk size, got: %"PRIuMAX", wanted:"
+			       " %"PRIuMAX),
+			     (uintmax_t)chunk_size,
+			     (uintmax_t)pcd->expected_size);
+	*pcd->p = chunk_start;
+	return 0;
+}
+
 int pair_chunk(struct chunkfile *cf,
 	       uint32_t chunk_id,
 	       const unsigned char **p,
@@ -178,6 +194,14 @@ int pair_chunk(struct chunkfile *cf,
 	return read_chunk(cf, chunk_id, pair_chunk_fn, &pcd);
 }

+int pair_chunk_expect(struct chunkfile *cf,
+		      uint32_t chunk_id, const unsigned char **p,
+		      size_t expected_size)
+{
+	struct pair_chunk_data pcd = { .p = p, .expected_size = expected_size };
+	return read_chunk(cf, chunk_id, pair_chunk_expect_fn, &pcd);
+}
+
 int pair_chunk_unsafe(struct chunkfile *cf,
 		      uint32_t chunk_id,
 		      const unsigned char **p)
diff --git a/chunk-format.h b/chunk-format.h
index 8dce7967f4..778f81f555 100644
--- a/chunk-format.h
+++ b/chunk-format.h
@@ -53,6 +53,10 @@ int pair_chunk(struct chunkfile *cf,
 	       const unsigned char **p,
 	       size_t *size);

+int pair_chunk_expect(struct chunkfile *cf,
+		      uint32_t chunk_id, const unsigned char **p,
+		      size_t expected_size);
+
 /*
  * Unsafe version of pair_chunk; it does not return the size,
  * meaning that the caller cannot possibly be careful about
diff --git a/commit-graph.c b/commit-graph.c
index 9b3b01da61..ed85161fdb 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -305,16 +305,6 @@ static int verify_commit_graph_lite(struct commit_graph *g)
 	return 0;
 }

-static int graph_read_oid_fanout(const unsigned char *chunk_start,
-				 size_t chunk_size, void *data)
-{
-	struct commit_graph *g = data;
-	if (chunk_size != 256 * sizeof(uint32_t))
-		return error("commit-graph oid fanout chunk is wrong size");
-	g->chunk_oid_fanout = (const uint32_t *)chunk_start;
-	return 0;
-}
-
 static int graph_read_oid_lookup(const unsigned char *chunk_start,
 				 size_t chunk_size, void *data)
 {
@@ -404,7 +394,10 @@ struct commit_graph *parse_commit_graph(struct repo_settings *s,
 				   GRAPH_HEADER_SIZE, graph->num_chunks))
 		goto free_and_return;

-	read_chunk(cf, GRAPH_CHUNKID_OIDFANOUT, graph_read_oid_fanout, graph);
+	if (pair_chunk_expect(cf, GRAPH_CHUNKID_OIDFANOUT,
+			      (const unsigned char **)&graph->chunk_oid_fanout,
+			      256 * sizeof(uint32_t)) < 0)
+		error(_("commit-graph oid fanout chunk is wrong size"));
 	read_chunk(cf, GRAPH_CHUNKID_OIDLOOKUP, graph_read_oid_lookup, graph);
 	pair_chunk_unsafe(cf, GRAPH_CHUNKID_DATA, &graph->chunk_commit_data);
 	pair_chunk_unsafe(cf, GRAPH_CHUNKID_EXTRAEDGES, &graph->chunk_extra_edges);
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index d25bea3ec5..467b5f5e9c 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -841,6 +841,7 @@ test_expect_success 'reader notices too-small oid fanout chunk' '
 	# otherwise we hit an earlier check
 	check_corrupt_chunk OIDF clear $(printf "000000%02x" $(test_seq 250)) &&
 	cat >expect.err <<-\EOF &&
+	error: mismatched chunk size, got: 1000, wanted: 1024
 	error: commit-graph oid fanout chunk is wrong size
 	error: commit-graph is missing the OID Fanout chunk
 	EOF
--- >8 ---

Or, quite honestly, having the "expected_size" parameter be required in
the safe version of `pair_chunk()`.

Thanks,
Taylor

[^1]: A non-brief review is on my to-do list for tomorrow.
Jeff King Oct. 11, 2023, 11:01 p.m. UTC | #3
On Tue, Oct 10, 2023 at 08:08:05PM -0400, Taylor Blau wrote:

> Nice. This makes sense and seems like an obvious improvement over the
> existing code.
> 
> I wonder how common this pattern is. We have read_chunk() which is for
> handling more complex scenarios than this. But the safe version of
> pair_chunk() really just wants to check that the size of the chunk is as
> expected and assign the location in the mmap to some pointer.

Sometimes yes, sometimes no. For fixed-size ones like this, that's
sufficient. For others we have to record the size and use it for later
bounds-checking. IIRC it's about 50/50 between the two.

> Do you think it would be worth changing pair_chunk() to take an expected
> size_t and handle this generically? I.e. have a version of
> chunk-format::pair_chunk_fn() that looks something like:
> 
>     static int pair_chunk_fn(const unsigned char *chunk_start,
>                              size_t chunk_size, void *data)
>     {
>         const unsigned char **p = data;
>         if (chunk_size != data->size)
>             return -1;
>         *p = chunk_start;
>         return 0;
>     }
> 
> and then our call here would be:
> 
>   if (pair_chunk(cf, GRAPH_CHUNKID_OIDFANOUT,
>                  (const unsigned char **)&graph->chunk_oid_fanout,
>                  256 * sizeof(uint32_t)) < 0)
>       return error("commit-graph oid fanout chunk is wrong size");
> 
> I dunno. It's hard to have a more concrete recomendation without having
> read the rest of the series. So it's possible that this is just complete
> nonsense ;-). But my hunch is that there are a number of callers that
> would benefit from having this built in.

I don't think it's nonsense, and I do think other callers would benefit.
On the other hand, I kind of like the notion that there is a complete
validation callback for each of these chunks. Even though it just checks
the size for now, it could handle other things. In the case of OIDF, for
example, we can check whether the entries are monotonic. It's just that
we happen to do those checks elsewhere.

Hmm, actually, looking at that again, I think I may have missed a case
in patch 6. For pack .idx files, we check the fanout table when they are
loaded. And patch 6 adds the same for commit-graph files. I thought midx
files were handled the same .idx, but looking at it again, I only see
the monotonicity check in the "multi-pack-index verify" code paths. So
it might need the same treatment.

I'm not sure how I missed that (I started by making a corrupted midx
first and couldn't get it to fail, which is when I discovered the
existing checks, but maybe I am mixing up .idx and midx in my memory).

-Peff
diff mbox series

Patch

diff --git a/commit-graph.c b/commit-graph.c
index a689a55b79..9b3b01da61 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -305,6 +305,16 @@  static int verify_commit_graph_lite(struct commit_graph *g)
 	return 0;
 }
 
+static int graph_read_oid_fanout(const unsigned char *chunk_start,
+				 size_t chunk_size, void *data)
+{
+	struct commit_graph *g = data;
+	if (chunk_size != 256 * sizeof(uint32_t))
+		return error("commit-graph oid fanout chunk is wrong size");
+	g->chunk_oid_fanout = (const uint32_t *)chunk_start;
+	return 0;
+}
+
 static int graph_read_oid_lookup(const unsigned char *chunk_start,
 				 size_t chunk_size, void *data)
 {
@@ -394,8 +404,7 @@  struct commit_graph *parse_commit_graph(struct repo_settings *s,
 				   GRAPH_HEADER_SIZE, graph->num_chunks))
 		goto free_and_return;
 
-	pair_chunk_unsafe(cf, GRAPH_CHUNKID_OIDFANOUT,
-		   (const unsigned char **)&graph->chunk_oid_fanout);
+	read_chunk(cf, GRAPH_CHUNKID_OIDFANOUT, graph_read_oid_fanout, graph);
 	read_chunk(cf, GRAPH_CHUNKID_OIDLOOKUP, graph_read_oid_lookup, graph);
 	pair_chunk_unsafe(cf, GRAPH_CHUNKID_DATA, &graph->chunk_commit_data);
 	pair_chunk_unsafe(cf, GRAPH_CHUNKID_EXTRAEDGES, &graph->chunk_extra_edges);
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index ba65f17dd9..d25bea3ec5 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -2,6 +2,7 @@ 
 
 test_description='commit graph'
 . ./test-lib.sh
+. "$TEST_DIRECTORY"/lib-chunk.sh
 
 GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0
 
@@ -821,4 +822,29 @@  test_expect_success 'overflow during generation version upgrade' '
 	)
 '
 
+corrupt_chunk () {
+	graph=full/.git/objects/info/commit-graph &&
+	test_when_finished "rm -rf $graph" &&
+	git -C full commit-graph write --reachable &&
+	corrupt_chunk_file $graph "$@"
+}
+
+check_corrupt_chunk () {
+	corrupt_chunk "$@" &&
+	git -C full -c core.commitGraph=false log >expect.out &&
+	git -C full -c core.commitGraph=true log >out 2>err &&
+	test_cmp expect.out out
+}
+
+test_expect_success 'reader notices too-small oid fanout chunk' '
+	# make it big enough that the graph file is plausible,
+	# otherwise we hit an earlier check
+	check_corrupt_chunk OIDF clear $(printf "000000%02x" $(test_seq 250)) &&
+	cat >expect.err <<-\EOF &&
+	error: commit-graph oid fanout chunk is wrong size
+	error: commit-graph is missing the OID Fanout chunk
+	EOF
+	test_cmp expect.err err
+'
+
 test_done