Message ID | pull.687.git.1596181765336.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | commit-graph: add verify changed paths option | expand |
On Fri, Jul 31, 2020 at 9:52 AM Son Luong Ngoc via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Son Luong Ngoc <sluongng@gmail.com> > > Add '--has-changed-paths' option to 'git commit-graph verify' subcommand > to validate whether the commit-graph was written with '--changed-paths' > option. > > Signed-off-by: Son Luong Ngoc <sluongng@gmail.com> [...] > It's probably going to take me a bit more time to write up some tests > for this, It would need some documentation too. > so I want to send it out first for comments. [...] > diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c > index 16c9f6101a..ce8a7cbe90 100644 > --- a/builtin/commit-graph.c > +++ b/builtin/commit-graph.c > @@ -18,7 +18,8 @@ static char const * const builtin_commit_graph_usage[] = { > }; > > static const char * const builtin_commit_graph_verify_usage[] = { > - N_("git commit-graph verify [--object-dir <objdir>] [--shallow] [--[no-]progress]"), > + N_("git commit-graph verify [--object-dir <objdir>] [--shallow] " > + "[--has-changed-paths] [--[no-]progress]"), > NULL > }; > > @@ -37,6 +38,7 @@ static struct opts_commit_graph { > int append; > int split; > int shallow; > + int has_changed_paths; > int progress; > int enable_changed_paths; > } opts; > @@ -71,12 +73,14 @@ static int graph_verify(int argc, const char **argv) > int open_ok; > int fd; > struct stat st; > - int flags = 0; > + enum commit_graph_verify_flags flags = 0; > > static struct option builtin_commit_graph_verify_options[] = { > OPT_STRING(0, "object-dir", &opts.obj_dir, > N_("dir"), > N_("The object directory to store the graph")), > + OPT_BOOL(0, "has-changed-paths", &opts.has_changed_paths, > + N_("verify that the commit-graph includes changed paths")), > OPT_BOOL(0, "shallow", &opts.shallow, > N_("if the commit-graph is split, only verify the tip file")), > OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")), > @@ -94,8 +98,10 @@ static int graph_verify(int argc, const char **argv) > opts.obj_dir = get_object_directory(); > if (opts.shallow) > flags |= COMMIT_GRAPH_VERIFY_SHALLOW; > + if (opts.has_changed_paths) > + flags |= COMMIT_GRAPH_VERIFY_CHANGED_PATHS; I wonder if OPT_BIT() could be used instead of OPT_BOOL() above to directly set the above flag, as the 'has_changed_paths' field in 'struct opts_commit_graph' seems to be used only for the purpose of setting this flag. > if (opts.progress) > - flags |= COMMIT_GRAPH_WRITE_PROGRESS; > + flags |= COMMIT_GRAPH_VERIFY_PROGRESS; Does this change belong to this patch? I think it would deserve an explanation in the commit message if that's the case. > odb = find_odb(the_repository, opts.obj_dir); > graph_name = get_commit_graph_filename(odb); > diff --git a/commit-graph.c b/commit-graph.c > index 1af68c297d..d83f5a2325 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -250,7 +250,7 @@ struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st, > return ret; > } > > -static int verify_commit_graph_lite(struct commit_graph *g) > +static int verify_commit_graph_lite(struct commit_graph *g, int verify_changed_path) [...] > -int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags) > +int verify_commit_graph(struct repository *r, > + struct commit_graph *g, > + enum commit_graph_verify_flags flags) It seems to me that it would be more coherent to have both verify_commit_graph() and verify_commit_graph_lite() accept an 'enum commit_graph_verify_flags flags' argument. Right now the "has_changed_paths" option is first an int, then it's converted to a flag and then to an int again before being passed to verify_commit_graph_lite(). It would be simpler if it could be a flag all along. Thanks, Christian.
"Son Luong Ngoc via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Son Luong Ngoc <sluongng@gmail.com> > > Add '--has-changed-paths' option to 'git commit-graph verify' subcommand > to validate whether the commit-graph was written with '--changed-paths' > option. The implementation seems to be only about "does this section exist?" and not "does this section have healthy/uncorrupted data?", which feels a bit strange for "verify". Instead of setting ourselves up to having to add "--has-this-section" and "--has-that-section" every time a new kind of data is added to the system, how about giving the verify command an option to list all the sections found in the file, or a separate "git commit-graph list-sections" subcommand?
On Fri, Jul 31, 2020 at 07:49:25AM +0000, Son Luong Ngoc via GitGitGadget wrote: > From: Son Luong Ngoc <sluongng@gmail.com> > > Add '--has-changed-paths' option to 'git commit-graph verify' subcommand > to validate whether the commit-graph was written with '--changed-paths' > option. Is a single boolean flag sufficient? If you have incrementals, you might have some slices with this chunk and some without. What should the boolean be in that case? I thought we had some way of reporting the number of commits covered by filters, but I can't seem to find it. Our "test-tool read-graph" can report on whether there's a bloom filter chunk, but I think it also doesn't distinguish between different slices (and anyway, it wouldn't be suitable for tools that don't rely on an actual built git.git directory). -Peff
On Fri, Jul 31, 2020 at 10:14:39AM -0700, Junio C Hamano wrote: > "Son Luong Ngoc via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > From: Son Luong Ngoc <sluongng@gmail.com> > > > > Add '--has-changed-paths' option to 'git commit-graph verify' subcommand > > to validate whether the commit-graph was written with '--changed-paths' > > option. > > The implementation seems to be only about "does this section exist?" > and not "does this section have healthy/uncorrupted data?", which > feels a bit strange for "verify". Instead of setting ourselves up > to having to add "--has-this-section" and "--has-that-section" every > time a new kind of data is added to the system, how about giving the > verify command an option to list all the sections found in the file, > or a separate "git commit-graph list-sections" subcommand? Completely agreed. When I suggested that Son work on this, I more had in mind something like 'git commit-graph verify --changed-paths' to mean "verify the integrity of the commit-graph(s), including regenerating changed-path Bloom filters and making sure they match". If you are just curious whether or not the section exists, I'd rather write a script to look for the 'BIDX' or 'BDAT' chunk IDs. That said, if they're spread across incremental, maybe it makes more sense to extend the commit-graph test tool. I dunno. Thanks, Taylor
On Fri, Jul 31, 2020 at 02:02:35PM -0400, Jeff King wrote: > On Fri, Jul 31, 2020 at 07:49:25AM +0000, Son Luong Ngoc via GitGitGadget wrote: > > > From: Son Luong Ngoc <sluongng@gmail.com> > > > > Add '--has-changed-paths' option to 'git commit-graph verify' subcommand > > to validate whether the commit-graph was written with '--changed-paths' > > option. > > Is a single boolean flag sufficient? If you have incrementals, you might > have some slices with this chunk and some without. What should the > boolean be in that case? I think you'd really want to know which layers do and don't have filters. It might be even more interesting to have a tool like what 'git show-index' is to '*.idx' files, maybe something like 'git show-graph' or 'git show-commit-graph'. Its output would be one line per commit that shows: - what layer in the chain it's located at - its graph_pos - its generation number - whether or not it has a Bloom filter - ??? That would be a useful tool for debugging anyway, even outside of the test suite. It would be even better if we could replace the test-tool with it. On an unrelated note; this patch is broken as-is, since it will only report that Bloom filters exist if the top-most graph has them. I have a patch to fix this that I have been meaning to send out for most of this week. I'll try to get to it shortly. > I thought we had some way of reporting the number of commits covered by > filters, but I can't seem to find it. I don't recall having anything like that. > Our "test-tool read-graph" can report on whether there's a bloom filter > chunk, but I think it also doesn't distinguish between different slices > (and anyway, it wouldn't be suitable for tools that don't rely on an > actual built git.git directory). > > -Peff Thanks, Taylor
On Fri, Jul 31, 2020 at 02:09:56PM -0400, Taylor Blau wrote: > > Is a single boolean flag sufficient? If you have incrementals, you might > > have some slices with this chunk and some without. What should the > > boolean be in that case? > > I think you'd really want to know which layers do and don't have > filters. It might be even more interesting to have a tool like what 'git > show-index' is to '*.idx' files, maybe something like 'git show-graph' > or 'git show-commit-graph'. Its output would be one line per commit that > shows: > > - what layer in the chain it's located at > - its graph_pos > - its generation number > - whether or not it has a Bloom filter > - ??? > > That would be a useful tool for debugging anyway, even outside of the > test suite. It would be even better if we could replace the test-tool > with it. Yeah, that was exactly what I had in mind, except that I'd make it a sub-command of "git commit-graph" ("show" or perhaps "dump"). -Peff
Note: re-send to mailing list due to me forgot to turn on Plain Text format. (sorry for the noise) Hi Peff, Taylor, Junio and Christian, Thanks a lot for the valuable feedbacks. This is exactly what I was hoping for by sending out the patch early! > On Jul 31, 2020, at 21:14, Jeff King <peff@peff.net> wrote: > > On Fri, Jul 31, 2020 at 02:09:56PM -0400, Taylor Blau wrote: > >>> Is a single boolean flag sufficient? If you have incrementals, you might >>> have some slices with this chunk and some without. What should the >>> boolean be in that case? >> >> I think you'd really want to know which layers do and don't have >> filters. It might be even more interesting to have a tool like what 'git >> show-index' is to '*.idx' files, maybe something like 'git show-graph' >> or 'git show-commit-graph'. Its output would be one line per commit that >> shows: >> >> - what layer in the chain it's located at >> - its graph_pos >> - its generation number >> - whether or not it has a Bloom filter >> - ??? >> >> That would be a useful tool for debugging anyway, even outside of the >> test suite. It would be even better if we could replace the test-tool >> with it. > > Yeah, that was exactly what I had in mind, except that I'd make it a > sub-command of "git commit-graph" ("show" or perhaps "dump"). I loved Junio's initial suggestion and the follow up here. I was thinking of something like 'git commit-graph verify --verbose' but now I agree that a distinct command such as 'show' might be more distinct and better communicate the purpose. I will stick with my poor-man bash/golang script for now to invalidate the commit-graph (chain or no-chain) as it does the job just fine. Let me see if I have the capacity to implement 'show' sub-command after. ^_^! > > -Peff Cheers, Son Luong.
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 16c9f6101a..ce8a7cbe90 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -18,7 +18,8 @@ static char const * const builtin_commit_graph_usage[] = { }; static const char * const builtin_commit_graph_verify_usage[] = { - N_("git commit-graph verify [--object-dir <objdir>] [--shallow] [--[no-]progress]"), + N_("git commit-graph verify [--object-dir <objdir>] [--shallow] " + "[--has-changed-paths] [--[no-]progress]"), NULL }; @@ -37,6 +38,7 @@ static struct opts_commit_graph { int append; int split; int shallow; + int has_changed_paths; int progress; int enable_changed_paths; } opts; @@ -71,12 +73,14 @@ static int graph_verify(int argc, const char **argv) int open_ok; int fd; struct stat st; - int flags = 0; + enum commit_graph_verify_flags flags = 0; static struct option builtin_commit_graph_verify_options[] = { OPT_STRING(0, "object-dir", &opts.obj_dir, N_("dir"), N_("The object directory to store the graph")), + OPT_BOOL(0, "has-changed-paths", &opts.has_changed_paths, + N_("verify that the commit-graph includes changed paths")), OPT_BOOL(0, "shallow", &opts.shallow, N_("if the commit-graph is split, only verify the tip file")), OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")), @@ -94,8 +98,10 @@ static int graph_verify(int argc, const char **argv) opts.obj_dir = get_object_directory(); if (opts.shallow) flags |= COMMIT_GRAPH_VERIFY_SHALLOW; + if (opts.has_changed_paths) + flags |= COMMIT_GRAPH_VERIFY_CHANGED_PATHS; if (opts.progress) - flags |= COMMIT_GRAPH_WRITE_PROGRESS; + flags |= COMMIT_GRAPH_VERIFY_PROGRESS; odb = find_odb(the_repository, opts.obj_dir); graph_name = get_commit_graph_filename(odb); diff --git a/commit-graph.c b/commit-graph.c index 1af68c297d..d83f5a2325 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -250,7 +250,7 @@ struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st, return ret; } -static int verify_commit_graph_lite(struct commit_graph *g) +static int verify_commit_graph_lite(struct commit_graph *g, int verify_changed_path) { /* * Basic validation shared between parse_commit_graph() @@ -276,6 +276,16 @@ static int verify_commit_graph_lite(struct commit_graph *g) error("commit-graph is missing the Commit Data chunk"); return 1; } + if (verify_changed_path) { + if (!g->chunk_bloom_indexes) { + error("commit-graph is missing Bloom Index chunk"); + return 1; + } + if (!g->chunk_bloom_data) { + error("commit-graph is missing Bloom Data chunk"); + return 1; + } + } return 0; } @@ -439,7 +449,7 @@ struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size) hashcpy(graph->oid.hash, graph->data + graph->data_len - graph->hash_len); - if (verify_commit_graph_lite(graph)) + if (verify_commit_graph_lite(graph, 0)) goto free_and_return; return graph; @@ -2216,7 +2226,9 @@ static void graph_report(const char *fmt, ...) #define GENERATION_ZERO_EXISTS 1 #define GENERATION_NUMBER_EXISTS 2 -int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags) +int verify_commit_graph(struct repository *r, + struct commit_graph *g, + enum commit_graph_verify_flags flags) { uint32_t i, cur_fanout_pos = 0; struct object_id prev_oid, cur_oid, checksum; @@ -2231,7 +2243,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags) return 1; } - verify_commit_graph_error = verify_commit_graph_lite(g); + verify_commit_graph_error = verify_commit_graph_lite(g, flags & COMMIT_GRAPH_VERIFY_CHANGED_PATHS); if (verify_commit_graph_error) return verify_commit_graph_error; @@ -2284,7 +2296,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags) if (verify_commit_graph_error & ~VERIFY_COMMIT_GRAPH_ERROR_HASH) return verify_commit_graph_error; - if (flags & COMMIT_GRAPH_WRITE_PROGRESS) + if (flags & COMMIT_GRAPH_VERIFY_PROGRESS) progress = start_progress(_("Verifying commits in commit graph"), g->num_commits); diff --git a/commit-graph.h b/commit-graph.h index 28f89cdf3e..29c01b5000 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -94,6 +94,12 @@ enum commit_graph_write_flags { COMMIT_GRAPH_WRITE_BLOOM_FILTERS = (1 << 3), }; +enum commit_graph_verify_flags { + COMMIT_GRAPH_VERIFY_SHALLOW = (1 << 0), + COMMIT_GRAPH_VERIFY_CHANGED_PATHS = (1 << 1), + COMMIT_GRAPH_VERIFY_PROGRESS = (1 << 2), +}; + enum commit_graph_split_flags { COMMIT_GRAPH_SPLIT_UNSPECIFIED = 0, COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED = 1, @@ -122,9 +128,9 @@ int write_commit_graph(struct object_directory *odb, enum commit_graph_write_flags flags, const struct split_commit_graph_opts *split_opts); -#define COMMIT_GRAPH_VERIFY_SHALLOW (1 << 0) - -int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags); +int verify_commit_graph(struct repository *r, + struct commit_graph *g, + enum commit_graph_verify_flags flags); void close_commit_graph(struct raw_object_store *); void free_commit_graph(struct commit_graph *);