Message ID | 0821a8073a48067ecd9ce08226656fa04d803f6b.1568216234.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | multi-pack-index: add --no-progress | expand |
"William Baker via GitGitGadget" <gitgitgadget@gmail.com> writes: > [verse] > -'git multi-pack-index' [--object-dir=<dir>] <subcommand> > +'git multi-pack-index' [--object-dir=<dir>] <subcommand> [--[no-]progress] I am wondering what the reasoning behind having this new one *after* the subcommand while the existing one *before* is. Isn't the --[no-]progress option supported by all subcommands of the multi-pack-index command, just like the --object-dir=<dir> option is? If there is no reason that explains why one must come before and the other must come after the subcommand, we are then adding another thing the end-users or scriptors need to memorize, which is not ideal. > +--[no-]progress:: > + Turn progress on/off explicitly. If neither is specified, progress is > + shown if standard error is connected to a terminal. Sounds sensible. > diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c > index b1ea1a6aa1..f8b2a74179 100644 > --- a/builtin/multi-pack-index.c > +++ b/builtin/multi-pack-index.c > @@ -6,34 +6,41 @@ > #include "trace2.h" > > static char const * const builtin_multi_pack_index_usage[] = { > - N_("git multi-pack-index [--object-dir=<dir>] (write|verify|expire|repack --batch-size=<size>)"), > + N_("git multi-pack-index [--object-dir=<dir>] (write|verify|expire|repack --batch-size=<size>) [--[no-]progress]"), > NULL > }; The same comment as the SYNOPSIS part. > static struct opts_multi_pack_index { > const char *object_dir; > unsigned long batch_size; > + int progress; > } opts; > > int cmd_multi_pack_index(int argc, const char **argv, > const char *prefix) > { > + unsigned flags = 0; > + > static struct option builtin_multi_pack_index_options[] = { > OPT_FILENAME(0, "object-dir", &opts.object_dir, > N_("object directory containing set of packfile and pack-index pairs")), > OPT_MAGNITUDE(0, "batch-size", &opts.batch_size, > N_("during repack, collect pack-files of smaller size into a batch that is larger than this size")), > + OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")), > OPT_END(), > }; Seeing that all the options that made me curious (see above) are defined here for all subcommands, I suspect that "technically", all options must come before the <subcommand> (side note: I know parse-options may reorder commands after options, but in the documentation and usage, we strongly discourage users from relying on the reordering and always show "global --options, subcommand, and then subcommand --options" order). I also see in the code that handles opts.batch_size that there is a workaround for this inverted code structure to make sure subcommands other than repack does not allow --batch-size option specified. Unless and until we get rid of the "git multi-pack-index" as a separate command (side note: when it happens, we;'d call the underlying midx API functions directly from appropriate places in the codepaths like "gc"), we probably would want to correct the use of parse_options() API in the implementation of this command before adding any new option or subcommand. > @@ -47,14 +54,15 @@ int cmd_multi_pack_index(int argc, const char **argv, > trace2_cmd_mode(argv[0]); > > if (!strcmp(argv[0], "repack")) > - return midx_repack(the_repository, opts.object_dir, (size_t)opts.batch_size); > + return midx_repack(the_repository, opts.object_dir, > + (size_t)opts.batch_size, flags); > if (opts.batch_size) > die(_("--batch-size option is only for 'repack' subcommand")); > > if (!strcmp(argv[0], "write")) > return write_midx_file(opts.object_dir); > if (!strcmp(argv[0], "verify")) > - return verify_midx_file(the_repository, opts.object_dir); > + return verify_midx_file(the_repository, opts.object_dir, flags); > if (!strcmp(argv[0], "expire")) > return expire_midx_packs(the_repository, opts.object_dir); We can see that the new option only affects "verify", even though the SYNOPSIS and usage text pretends that everybody understands and reacts to it. Shouldn't it be documented just like how --batch-size is documented that it is understood only by "repack"? If the mid-term aspiration of this patch is to later enhance other subcommands to also understand the progress output or verbosity option (and if the excuse given as a response to the above analysis is "this is just a first step, more will come later"), the instead of adding a "unsigned flag" local variable to the function, it would probably make much more sense to (1) make struct opts_multi_pack_index as a part of the public API between cmd_multi_pack_index() and midx.c and document it in midx.h; (2) instead of passing opts.object_dir to existing command implementations, pass &opts, the pointer to the whole structure; (3) add a new field "unsigned progress" to the structure, and teach the command line parser to flip it upon seeing "--[no-]progress".
Hi Junio, Thanks for the review! On 9/12/19 1:17 PM, Junio C Hamano wrote: >> +'git multi-pack-index' [--object-dir=<dir>] <subcommand> [--[no-]progress] > > I am wondering what the reasoning behind having this new one *after* > the subcommand while the existing one *before* is. Isn't the > --[no-]progress option supported by all subcommands of the > multi-pack-index command, just like the --object-dir=<dir> option > is? > > always show "global --options, subcommand, and then subcommand --options" order). Thanks for calling this out. I didn't have a specific reason for making this option appear after the subcommand. I tried looking at other commands as examples and I missed that there's a specific ordering based on the type of the option. I will clean this up in the next patch. > I also see in the code that > handles opts.batch_size that there is a workaround for this inverted > code structure to make sure subcommands other than repack does not > allow --batch-size option specified. > we probably would want to correct the use > of parse_options() API in the implementation of this command before > adding any new option or subcommand. To confirm I understand, is the recommendation that cmd_multi_pack_index be updated to only parse "batch-size" for the repack subcommand (i.e. use PARSE_OPT_STOP_AT_NON_OPTION to parse all of the common options, and then only parse "batch-size" when the repack subcommand is running)? >> @@ -47,14 +54,15 @@ int cmd_multi_pack_index(int argc, const char **argv, >> trace2_cmd_mode(argv[0]); >> >> if (!strcmp(argv[0], "repack")) >> - return midx_repack(the_repository, opts.object_dir, (size_t)opts.batch_size); >> + return midx_repack(the_repository, opts.object_dir, >> + (size_t)opts.batch_size, flags); >> if (opts.batch_size) >> die(_("--batch-size option is only for 'repack' subcommand")); >> >> if (!strcmp(argv[0], "write")) >> return write_midx_file(opts.object_dir); >> if (!strcmp(argv[0], "verify")) >> - return verify_midx_file(the_repository, opts.object_dir); >> + return verify_midx_file(the_repository, opts.object_dir, flags); >> if (!strcmp(argv[0], "expire")) >> return expire_midx_packs(the_repository, opts.object_dir); > > We can see that the new option only affects "verify", even though > the SYNOPSIS and usage text pretends that everybody understands and > reacts to it. Shouldn't it be documented just like how --batch-size > is documented that it is understood only by "repack"? > > If the mid-term aspiration of this patch is to later enhance other > subcommands to also understand the progress output or verbosity > option (and if the excuse given as a response to the above analysis > is "this is just a first step, more will come later") Yep this was my thinking. Today "repack" and "verify" are the only subcommands that have any progress output but as the other subcommands learn how to provide progress the [--[no-]progress] option can be used to control it. > instead of adding a "unsigned flag" local variable to the function, it would > probably make much more sense to > > (1) make struct opts_multi_pack_index as a part of the public API > between cmd_multi_pack_index() and midx.c and document it in > midx.h; > > (2) instead of passing opts.object_dir to existing command > implementations, pass &opts, the pointer to the whole > structure; > > (3) add a new field "unsigned progress" to the structure, and teach > the command line parser to flip it upon seeing "--[no-]progress". > Thanks for this suggestion I'll use this approach in the next patch. One small point to clarify, in the current struct I'm using an int for "progress" because OPT_BOOL expects an int*. Do you have any concerns with keeping "progress" as an int in the public struct, or would you rather cmd_multi_pack_index parse an int internally and use it to populate the public struct? Thanks! William
William Baker <williamtbakeremail@gmail.com> writes: >> I also see in the code that >> handles opts.batch_size that there is a workaround for this inverted >> code structure to make sure subcommands other than repack does not >> allow --batch-size option specified. > >> we probably would want to correct the use >> of parse_options() API in the implementation of this command before >> adding any new option or subcommand. > > To confirm I understand, is the recommendation that > cmd_multi_pack_index be updated to only parse "batch-size" for the repack > subcommand (i.e. use PARSE_OPT_STOP_AT_NON_OPTION to parse all of the common > options, and then only parse "batch-size" when the repack subcommand is running)? Compare the ways how dispatching and command line option parsing of subcommands in "multi-pack-index" and "commit-graph" are implemented. When a command (e.g. "commit-graph") takes common options and also has subcommands (e.g. "read" and "write") that take different set of options, there is a common options parser in the primary entry point (e.g. "cmd_commit_graph()"), and after dispatching to a chosen subcommand, the implementation of each subcommand (e.g. "graph_read()" and "graph_write()") parses its own options. That's bog-standard way. cmd_multi_pack_index() "cheats" and does not implement proper subcommand dispatch (it directly makes underlying API calls instead). Started as an isolated experimental command whose existence as a standalone command is solely because it was easier to experiment with (as opposed to being a plumbing command to be used by scripters), it probably was an acceptable trade-off to leave the code in this shape. In the longer run, I suspect we'd rather want to get rid of "git multi-pack-index" as a standalone command and instead make "git gc" and other commands make direct calls to the internal machinery of midx code from strategic places. So in that sense, I am not sure if I should "recommend" fixing the way the subcommand dispatching works in this command, or just accept to keep the ugly technical debt and let it limp along... >> subcommands to also understand the progress output or verbosity >> option (and if the excuse given as a response to the above analysis >> is "this is just a first step, more will come later") > > Yep this was my thinking. Today "repack" and "verify" are the only subcommands > that have any progress output but as the other subcommands learn how to provide > progress the [--[no-]progress] option can be used to control it. > >> instead of adding a "unsigned flag" local variable to the function, it would >> probably make much more sense to >> >> (1) make struct opts_multi_pack_index as a part of the public API >> between cmd_multi_pack_index() and midx.c and document it in >> midx.h; >> >> (2) instead of passing opts.object_dir to existing command >> implementations, pass &opts, the pointer to the whole >> structure; >> >> (3) add a new field "unsigned progress" to the structure, and teach >> the command line parser to flip it upon seeing "--[no-]progress". >> > > Thanks for this suggestion I'll use this approach in the next patch. ...but it appears to me that you are more enthused than I am in improving this code, so I probably should actually recommend fixing the main entry point function of this command to imitate the way "commit-graph" implements subcommands and their own set of options ;-)
On 9/13/19 1:26 PM, Junio C Hamano wrote: > Compare the ways how dispatching and command line option parsing of > subcommands in "multi-pack-index" and "commit-graph" are > implemented. When a command (e.g. "commit-graph") takes common > options and also has subcommands (e.g. "read" and "write") that take > different set of options, there is a common options parser in the > primary entry point (e.g. "cmd_commit_graph()"), and after > dispatching to a chosen subcommand, the implementation of each > subcommand (e.g. "graph_read()" and "graph_write()") parses its own > options. That's bog-standard way. Thanks for the pointer to "commit-graph", looking through that code cleared up any questions I had. After taking another look through the "multi-pack-index" code my plan is to update all of the subcommands to understand [--[no-]progress]. I gave the public struct in midx.h approach a try, but after seeing how that looks I think it would be cleaner to update "write" and "expire" to display progress and have an explicit parameter. > Started as an isolated experimental command whose > existence as a standalone command is solely because it was easier to > experiment with (as opposed to being a plumbing command to be used > by scripters), it probably was an acceptable trade-off to leave the > code in this shape. In the longer run, I suspect we'd rather want > to get rid of "git multi-pack-index" as a standalone command and > instead make "git gc" and other commands make direct calls to the > internal machinery of midx code from strategic places. So in that > sense, I am not sure if I should "recommend" fixing the way the > subcommand dispatching works in this command, or just accept to keep > the ugly technical debt and let it limp along... Thanks for the background here, it helped with understanding why the multi-pack-index parsing is different than the other commands. My plan is to include 3 commits in the next (v2) patch series: 1. Adding the new parameters to midx.h/c to control progress 2. Update write/expire to display progress 3. Update the multi-pack-index.c builtin to parse the [--[no-]progress] option and update the tests. I wasn't thinking that I would adjust the the subcommand dispatching in multi-pack-index.c in this patch series. By updating all of the subcommands to support [--[no-]progress] I should be able to keep the changes to multi-pack-index.c quite small. If you see any potential issues with this approach please let me know. Thanks, William
diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt index 233b2b7862..19a5a42dc0 100644 --- a/Documentation/git-multi-pack-index.txt +++ b/Documentation/git-multi-pack-index.txt @@ -9,7 +9,7 @@ git-multi-pack-index - Write and verify multi-pack-indexes SYNOPSIS -------- [verse] -'git multi-pack-index' [--object-dir=<dir>] <subcommand> +'git multi-pack-index' [--object-dir=<dir>] <subcommand> [--[no-]progress] DESCRIPTION ----------- @@ -23,6 +23,10 @@ OPTIONS `<dir>/packs/multi-pack-index` for the current MIDX file, and `<dir>/packs` for the pack-files to index. +--[no-]progress:: + Turn progress on/off explicitly. If neither is specified, progress is + shown if standard error is connected to a terminal. + The following subcommands are available: write:: diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c index b1ea1a6aa1..f8b2a74179 100644 --- a/builtin/multi-pack-index.c +++ b/builtin/multi-pack-index.c @@ -6,34 +6,41 @@ #include "trace2.h" static char const * const builtin_multi_pack_index_usage[] = { - N_("git multi-pack-index [--object-dir=<dir>] (write|verify|expire|repack --batch-size=<size>)"), + N_("git multi-pack-index [--object-dir=<dir>] (write|verify|expire|repack --batch-size=<size>) [--[no-]progress]"), NULL }; static struct opts_multi_pack_index { const char *object_dir; unsigned long batch_size; + int progress; } opts; int cmd_multi_pack_index(int argc, const char **argv, const char *prefix) { + unsigned flags = 0; + static struct option builtin_multi_pack_index_options[] = { OPT_FILENAME(0, "object-dir", &opts.object_dir, N_("object directory containing set of packfile and pack-index pairs")), OPT_MAGNITUDE(0, "batch-size", &opts.batch_size, N_("during repack, collect pack-files of smaller size into a batch that is larger than this size")), + OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")), OPT_END(), }; git_config(git_default_config, NULL); + opts.progress = isatty(2); argc = parse_options(argc, argv, prefix, builtin_multi_pack_index_options, builtin_multi_pack_index_usage, 0); if (!opts.object_dir) opts.object_dir = get_object_directory(); + if (opts.progress) + flags |= MIDX_PROGRESS; if (argc == 0) usage_with_options(builtin_multi_pack_index_usage, @@ -47,14 +54,15 @@ int cmd_multi_pack_index(int argc, const char **argv, trace2_cmd_mode(argv[0]); if (!strcmp(argv[0], "repack")) - return midx_repack(the_repository, opts.object_dir, (size_t)opts.batch_size); + return midx_repack(the_repository, opts.object_dir, + (size_t)opts.batch_size, flags); if (opts.batch_size) die(_("--batch-size option is only for 'repack' subcommand")); if (!strcmp(argv[0], "write")) return write_midx_file(opts.object_dir); if (!strcmp(argv[0], "verify")) - return verify_midx_file(the_repository, opts.object_dir); + return verify_midx_file(the_repository, opts.object_dir, flags); if (!strcmp(argv[0], "expire")) return expire_midx_packs(the_repository, opts.object_dir); diff --git a/midx.c b/midx.c index d649644420..b1a9ad9e5b 100644 --- a/midx.c +++ b/midx.c @@ -1077,19 +1077,20 @@ static int compare_pair_pos_vs_id(const void *_a, const void *_b) display_progress(progress, _n); \ } while (0) -int verify_midx_file(struct repository *r, const char *object_dir) +int verify_midx_file(struct repository *r, const char *object_dir, unsigned flags) { struct pair_pos_vs_id *pairs = NULL; uint32_t i; - struct progress *progress; + struct progress *progress = NULL; struct multi_pack_index *m = load_multi_pack_index(object_dir, 1); verify_midx_error = 0; if (!m) return 0; - progress = start_progress(_("Looking for referenced packfiles"), - m->num_packs); + if (flags & MIDX_PROGRESS) + progress = start_progress(_("Looking for referenced packfiles"), + m->num_packs); for (i = 0; i < m->num_packs; i++) { if (prepare_midx_pack(r, m, i)) midx_report("failed to load pack in position %d", i); @@ -1107,8 +1108,9 @@ int verify_midx_file(struct repository *r, const char *object_dir) i, oid_fanout1, oid_fanout2, i + 1); } - progress = start_sparse_progress(_("Verifying OID order in MIDX"), - m->num_objects - 1); + if (flags & MIDX_PROGRESS) + progress = start_sparse_progress(_("Verifying OID order in MIDX"), + m->num_objects - 1); for (i = 0; i < m->num_objects - 1; i++) { struct object_id oid1, oid2; @@ -1135,13 +1137,15 @@ int verify_midx_file(struct repository *r, const char *object_dir) pairs[i].pack_int_id = nth_midxed_pack_int_id(m, i); } - progress = start_sparse_progress(_("Sorting objects by packfile"), - m->num_objects); + if (flags & MIDX_PROGRESS) + progress = start_sparse_progress(_("Sorting objects by packfile"), + m->num_objects); display_progress(progress, 0); /* TODO: Measure QSORT() progress */ QSORT(pairs, m->num_objects, compare_pair_pos_vs_id); stop_progress(&progress); - progress = start_sparse_progress(_("Verifying object offsets"), m->num_objects); + if (flags & MIDX_PROGRESS) + progress = start_sparse_progress(_("Verifying object offsets"), m->num_objects); for (i = 0; i < m->num_objects; i++) { struct object_id oid; struct pack_entry e; @@ -1316,7 +1320,7 @@ static int fill_included_packs_batch(struct repository *r, return 0; } -int midx_repack(struct repository *r, const char *object_dir, size_t batch_size) +int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, unsigned flags) { int result = 0; uint32_t i; @@ -1341,6 +1345,12 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size) strbuf_addstr(&base_name, object_dir); strbuf_addstr(&base_name, "/pack/pack"); argv_array_push(&cmd.args, base_name.buf); + + if (flags & MIDX_PROGRESS) + argv_array_push(&cmd.args, "--progress"); + else + argv_array_push(&cmd.args, "-q"); + strbuf_release(&base_name); cmd.git_cmd = 1; diff --git a/midx.h b/midx.h index f0ae656b5d..88abc85bc3 100644 --- a/midx.h +++ b/midx.h @@ -37,6 +37,8 @@ struct multi_pack_index { char object_dir[FLEX_ARRAY]; }; +#define MIDX_PROGRESS (1 << 0) + struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local); int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id); int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result); @@ -49,9 +51,9 @@ int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, i int write_midx_file(const char *object_dir); void clear_midx_file(struct repository *r); -int verify_midx_file(struct repository *r, const char *object_dir); +int verify_midx_file(struct repository *r, const char *object_dir, unsigned flags); int expire_midx_packs(struct repository *r, const char *object_dir); -int midx_repack(struct repository *r, const char *object_dir, size_t batch_size); +int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, unsigned flags); void close_midx(struct multi_pack_index *m); diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index c72ca04399..cdc09b85f1 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -169,6 +169,21 @@ test_expect_success 'verify multi-pack-index success' ' git multi-pack-index verify --object-dir=$objdir ' +test_expect_success 'verify progress off for redirected stderr' ' + git multi-pack-index verify --object-dir=$objdir 2>err && + test_line_count = 0 err +' + +test_expect_success 'verify force progress on for stderr' ' + git multi-pack-index verify --object-dir=$objdir --progress 2>err && + test_file_not_empty err +' + +test_expect_success 'verify with the --no-progress option' ' + git multi-pack-index verify --object-dir=$objdir --no-progress 2>err && + test_line_count = 0 err +' + # usage: corrupt_midx_and_verify <pos> <data> <objdir> <string> corrupt_midx_and_verify() { POS=$1 && @@ -284,6 +299,21 @@ test_expect_success 'git-fsck incorrect offset' ' "git -c core.multipackindex=true fsck" ' +test_expect_success 'repack progress off for redirected stderr' ' + git multi-pack-index repack --object-dir=$objdir 2>err && + test_line_count = 0 err +' + +test_expect_success 'repack force progress on for stderr' ' + git multi-pack-index repack --object-dir=$objdir --progress 2>err && + test_file_not_empty err +' + +test_expect_success 'repack with the --no-progress option' ' + git multi-pack-index repack --object-dir=$objdir --no-progress 2>err && + test_line_count = 0 err +' + test_expect_success 'repack removes multi-pack-index' ' test_path_is_file $objdir/pack/multi-pack-index && GIT_TEST_MULTI_PACK_INDEX=0 git repack -adf &&