diff mbox series

[5/8] commit-graph: read `BIDX` chunk with `pair_chunk_expect()`

Message ID 45cac29403e63483951f7766c6da3c022c68d9f0.1697225110.git.me@ttaylorr.com (mailing list archive)
State New, archived
Headers show
Series chunk-format: introduce `pair_chunk_expect()` convenience API | expand

Commit Message

Taylor Blau Oct. 13, 2023, 7:25 p.m. UTC
Perform an identical conversion as in previous commits to read the BIDX
chunk.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 commit-graph.c | 18 ++++--------------
 1 file changed, 4 insertions(+), 14 deletions(-)

Comments

Taylor Blau Oct. 13, 2023, 7:49 p.m. UTC | #1
On Fri, Oct 13, 2023 at 03:25:30PM -0400, Taylor Blau wrote:
> Perform an identical conversion as in previous commits to read the BIDX
> chunk.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  commit-graph.c | 18 ++++--------------
>  1 file changed, 4 insertions(+), 14 deletions(-)

Oops. This fails t4216 because it changes the warning() message, which I
thought I excluded from this patch :-<.

Here is a replacement that passes the test. I can reroll the entire
"series" if we decide that this is a worthwhile direction to go in:

--- 8< ---

Subject: [PATCH] commit-graph: read `BIDX` chunk with `pair_chunk_expect()`

Perform an identical conversion as in previous commits to read the BIDX
chunk.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 commit-graph.c | 18 ++++--------------
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 0fab99f5dd..66c2e628d8 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -330,18 +330,6 @@ static int graph_read_oid_lookup(const unsigned char *chunk_start,
 	return 0;
 }

-static int graph_read_bloom_index(const unsigned char *chunk_start,
-				  size_t chunk_size, void *data)
-{
-	struct commit_graph *g = data;
-	if (chunk_size != g->num_commits * 4) {
-		warning("commit-graph changed-path index chunk is too small");
-		return -1;
-	}
-	g->chunk_bloom_indexes = chunk_start;
-	return 0;
-}
-
 static int graph_read_bloom_data(const unsigned char *chunk_start,
 				  size_t chunk_size, void *data)
 {
@@ -461,8 +449,10 @@ struct commit_graph *parse_commit_graph(struct repo_settings *s,
 	}

 	if (s->commit_graph_read_changed_paths) {
-		read_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES,
-			   graph_read_bloom_index, graph);
+		if (pair_chunk_expect(cf, GRAPH_CHUNKID_BLOOMINDEXES,
+				      &graph->chunk_bloom_indexes,
+				      st_mult(graph->num_commits, 4)) == -1)
+			warning(_("commit-graph changed-path index chunk is too small"));
 		read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA,
 			   graph_read_bloom_data, graph);
 	}
--
2.38.0.16.g393fd4c6db
Junio C Hamano Oct. 14, 2023, 4:10 p.m. UTC | #2
Taylor Blau <me@ttaylorr.com> writes:

> -static int graph_read_bloom_index(const unsigned char *chunk_start,
> -				  size_t chunk_size, void *data)
> -{
> -	struct commit_graph *g = data;
> -	if (chunk_size != g->num_commits * 4) {
> -		warning("commit-graph changed-path index chunk is too small");
> -		return -1;
> -	}
> ...
> @@ -461,8 +449,10 @@ struct commit_graph *parse_commit_graph(struct repo_settings *s,
>  	}
>  
>  	if (s->commit_graph_read_changed_paths) {
> +		if (pair_chunk_expect(cf, GRAPH_CHUNKID_BLOOMINDEXES,
> +				      &graph->chunk_bloom_indexes,
> +				      st_mult(graph->num_commits, 4)) == -1)
> +			warning(_("commit-graph changed-path index chunk is too small (%d)"), graph->num_commits * 4);
>  		read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA,
>  			   graph_read_bloom_data, graph);
>  	}

Overall the series looked sane, but the need for each caller to
supply error messages, when the helper perfectly well knows how many
bytes the caller expected and how many bytes there are avaiable, was
a bit disturbing, as the level of detail given per each caller will
inevitably become uneven.  Even now, some give an error() while
others give a warning(), even though I suspect all of them should be
data errors.

I wonder if it makes sense to stuff the message template in the
pair_chunk_data structure and do

static int pair_chunk_expect_fn(const unsigned char *chunk_start,
				size_t chunk_size,
				void *data)
{
	struct pair_chunk_data *pcd = data;
	if (pcd->expected_size != chunk_size)
		return error(_(pcd->message), pcd->expected_size, chunk_size);
	*pcd->p = chunk_start;
	return 0;
}
Jeff King Oct. 20, 2023, 10:31 a.m. UTC | #3
On Sat, Oct 14, 2023 at 09:10:22AM -0700, Junio C Hamano wrote:

> > @@ -461,8 +449,10 @@ struct commit_graph *parse_commit_graph(struct repo_settings *s,
> >  	}
> >  
> >  	if (s->commit_graph_read_changed_paths) {
> > +		if (pair_chunk_expect(cf, GRAPH_CHUNKID_BLOOMINDEXES,
> > +				      &graph->chunk_bloom_indexes,
> > +				      st_mult(graph->num_commits, 4)) == -1)
> > +			warning(_("commit-graph changed-path index chunk is too small (%d)"), graph->num_commits * 4);
> >  		read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA,
> >  			   graph_read_bloom_data, graph);
> >  	}
> 
> Overall the series looked sane, but the need for each caller to
> supply error messages, when the helper perfectly well knows how many
> bytes the caller expected and how many bytes there are avaiable, was
> a bit disturbing, as the level of detail given per each caller will
> inevitably become uneven.  Even now, some give an error() while
> others give a warning(), even though I suspect all of them should be
> data errors.
> 
> I wonder if it makes sense to stuff the message template in the
> pair_chunk_data structure and do
> 
> static int pair_chunk_expect_fn(const unsigned char *chunk_start,
> 				size_t chunk_size,
> 				void *data)
> {
> 	struct pair_chunk_data *pcd = data;
> 	if (pcd->expected_size != chunk_size)
> 		return error(_(pcd->message), pcd->expected_size, chunk_size);
> 	*pcd->p = chunk_start;
> 	return 0;
> }

One issue with the series as-is is that the "chunk is too small"
messages can be misleading when the chunk is in fact missing. We do say
"missing or corrupt" in the die message (at least for midx; I did not
update the similar ones for commit-graph), but the explicit "too small"
for a missing chunk seems to me to cross the line.

The caller can distinguish the cases by the actual value returned from
pair_chunk_expect(), but doing so makes the code a lot longer and harder
to read.

Your suggestion above takes care of it naturally (in the same way that
the existing code does, which basically is emitting the same message in
each read_chunk callback).

-Peff
diff mbox series

Patch

diff --git a/commit-graph.c b/commit-graph.c
index 0fab99f5dd..ad98f6334d 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -330,18 +330,6 @@  static int graph_read_oid_lookup(const unsigned char *chunk_start,
 	return 0;
 }
 
-static int graph_read_bloom_index(const unsigned char *chunk_start,
-				  size_t chunk_size, void *data)
-{
-	struct commit_graph *g = data;
-	if (chunk_size != g->num_commits * 4) {
-		warning("commit-graph changed-path index chunk is too small");
-		return -1;
-	}
-	g->chunk_bloom_indexes = chunk_start;
-	return 0;
-}
-
 static int graph_read_bloom_data(const unsigned char *chunk_start,
 				  size_t chunk_size, void *data)
 {
@@ -461,8 +449,10 @@  struct commit_graph *parse_commit_graph(struct repo_settings *s,
 	}
 
 	if (s->commit_graph_read_changed_paths) {
-		read_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES,
-			   graph_read_bloom_index, graph);
+		if (pair_chunk_expect(cf, GRAPH_CHUNKID_BLOOMINDEXES,
+				      &graph->chunk_bloom_indexes,
+				      st_mult(graph->num_commits, 4)) == -1)
+			warning(_("commit-graph changed-path index chunk is too small (%d)"), graph->num_commits * 4);
 		read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA,
 			   graph_read_bloom_data, graph);
 	}