Message ID | 20220128183319.43496-1-johncai86@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC,v3] cat-file: add a --stdin-cmd mode | expand |
On 29/01/22 01.33, John Cai wrote: > Future improvements: > - a non-trivial part of "cat-file --batch" time is spent > on parsing its argument and seeing if it's a revision, ref etc. So we > could add a command that only accepts a full-length 40 > character SHA-1. I think the full hash is actually revision name.
Bagas Sanjaya <bagasdotme@gmail.com> writes: > On 29/01/22 01.33, John Cai wrote: >> Future improvements: >> - a non-trivial part of "cat-file --batch" time is spent >> on parsing its argument and seeing if it's a revision, ref etc. So we >> could add a command that only accepts a full-length 40 >> character SHA-1. > > I think the full hash is actually revision name. There is no entry for "revision name" in Documentation/glossary-content.txt ;-) But to John, if you have a loop that feedseach line to get_oid(), while (getline(buf)) { struct object_id oid; if (get_oid(buf, &oid)) warn and continue; use oid; } is it much slower than a mode that can ONLY handle a full object name input, i.e. while (getline(buf)) { struct object_id oid; if (get_oid_hex(buf, &oid)) warn and continue; use oid; } when your input is restricted to full object names? get_oid() == repo_get_oid() -> get_oid_with_context() -> get_oid_with_context_1() -> get_oid_1() -> peel_onion() -> get_oid_basic() -> get_oid_hex() -> repo_dwim_ref() it seems that warn_ambiguous_refs and warn_on_object_refname_ambiguity we would waste time on refname discovery but I see cat-file already has some provision to disable this check. So when we do not need to call repo_dwim_ref(), do we still spend measurable cycles in this callchain?
On Mon, Jan 31, 2022 at 9:34 PM John Cai <johncai86@gmail.com> wrote: > > This RFC patch proposes a new flag --batch-command that works with > git-cat-file --batch. The subject is "Re: [RFC v3] cat-file: add a --stdin-cmd mode" and now you are talking about '--batch-command' instead of '--stdin-cmd'. "that works with git-cat-file --batch" is not very clear. Maybe you could find a wording that explains better how --batch-command is different from --batch. Also I think at this point this should probably not be an RFC patch anymore but a regular one. > Similar to git-update-ref --stdin, it will accept > commands and arguments from stdin. > > The start of this idea was discussed in [1], where the original > motivation was to be able to control when the buffer was flushed to > stdout in --buffer mode. That would be nice in a cover letter but I am not sure a commit message is the right place for this. > However, this can actually be much more useful in situations when > git-cat-file --batch is being used as a long lived backend query > process. At GitLab, we use a pair of cat-file processes. One for > iterating over object metadata with --batch-check, and the other to grab > object contents with --batch. However, if we had --batch-command, we could > get rid of the second --batch-check process, Maybe s/second// would make it clear that there are no two --batch-command processes. > and just have one process > where we can flip between getting object info, and getting object contents. > This can lead to huge savings since on a given server there could be hundreds to > thousands of git cat-file processes at a time. It's not clear if all the git cat-file processes you are talking about are mostly --batch-check processes or --batch processes, or a roughly equal amount of both. My guess is the latter and that --batch-command would mean that there would be around two times fewer cat-file processes. > git cat-file --batch-command > > $ <command> [arg1] [arg2] NL It's a bit unclear what the 2 above lines mean. Maybe you could add a small explanation like for example "The new flag can be used like this:" and "It receives commands from stdin in the format:" Also not sure why there is a '$' char in front of '<command> [arg1] [arg2] NL' but not in front of 'git cat-file --batch-command'. It doesn't look like in the 'git update-ref --stdin' doc that '$' are used in front of the commands that can be passed through stdin. > This patch adds three commands: object, info, fflush Maybe s/three commands/the following first three commands/ > $ object <sha1> NL > $ info <sha1> NL > $ fflush NL Idem about '$'. > These three would be immediately useful in GitLab's context, but one can > imagine this mode to be further extended for other things. Not very clear which "mode" you are talking about. You have been talking about a mode only in the subject so far. Maybe you should talk a bit about that above when '<command> [arg1] [arg2] NL' is introduced. Also you don't talk about the output format. --batch and --batch-check accept [=<format>], but it looks like --batch-command doesn't. > Future improvements: > - a non-trivial part of "cat-file --batch" time is spent > on parsing its argument and seeing if it's a revision, ref etc. So we > could add a command that only accepts a full-length 40 > character SHA-1. In a cover letter that would be ok, but I am not sure that a commit message is the best place for this kind of details about future work. > This would be the first step in adding such an interface to > git-cat-file. > > [1] https://lore.kernel.org/git/pull.1124.git.git.1636149400.gitgitgadget@gmail.com/ > > Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> > Signed-off-by: John Cai <johncai86@gmail.com> > --- > > Taylor, I'd be interested in your thoughts on this proposal since you helped > review the previous patch that turned into this RFC. Thanks! > > Changes from v2: > > - refactored tests to be within run_tests() > - added a test to test --buffer behavior with fflush > - cleaned up cat-file.c: clarified var names, removed unnecessary code > based on suggestions from Phillip Wood > - removed strvec changes > > Changes from v1: > > - changed option name to batch-command. > - changed command function interface to receive the whole line after the command > name to put the onus of parsing arguments to each individual command function. > - pass in whole line to batch_one_object in both parse_cmd_object and > parse_cmd_info to support spaces in the object reference. > - removed addition of -z to include in a separate patch series > - added documentation. > --- > Documentation/git-cat-file.txt | 15 +++++ > builtin/cat-file.c | 114 +++++++++++++++++++++++++++++++-- > t/t1006-cat-file.sh | 83 ++++++++++++++++++++++++ > 3 files changed, 205 insertions(+), 7 deletions(-) > > diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt > index bef76f4dd0..254e546c79 100644 > --- a/Documentation/git-cat-file.txt > +++ b/Documentation/git-cat-file.txt > @@ -96,6 +96,21 @@ OPTIONS > need to specify the path, separated by whitespace. See the > section `BATCH OUTPUT` below for details. > > +--batch-command:: > + Enter a command mode that reads from stdin. Maybe s/a command mode that reads from stdin/a mode that reads commands from stdin/ Also I would expect something about the output, like perhaps "...and ouputs the command results to stdout". > May not be combined with any > + other options or arguments except `--textconv` or `--filters`, in which > + case the input lines also need to specify the path, separated by > + whitespace. See the section `BATCH OUTPUT` below for details. The BATCH OUTPUT section says that a format can be passed but that doesn't seem to be the case with --batch-command. So you might need to make some changes to that section too or add a bit more details about the output here. > +object <object>:: > + Print object contents for object reference <object> > + > +info <object>:: > + Print object info for object reference <object> > + > +flush:: > + Flush to stdout immediately when used with --buffer > + > --batch-all-objects:: > Instead of reading a list of objects on stdin, perform the > requested batch operation on all objects in the repository and > diff --git a/builtin/cat-file.c b/builtin/cat-file.c > index 7b3f42950e..cc9e47943b 100644 > --- a/builtin/cat-file.c > +++ b/builtin/cat-file.c > @@ -24,9 +24,11 @@ struct batch_options { > int buffer_output; > int all_objects; > int unordered; > - int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */ > + int mode; /* may be 'w' or 'c' for --filters or --textconv */ > const char *format; > + int command; > }; Maybe add a blank line here. > +static char line_termination = '\n'; > > static const char *force_path; > > @@ -302,19 +304,19 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d > if (data->type == OBJ_BLOB) { > if (opt->buffer_output) > fflush(stdout); > - if (opt->cmdmode) { > + if (opt->mode) { The mechanical s/cmdmode/mode/g change could have been made in a preparatory patch to make this patch a bit smaller and easier to digest. > +static void batch_objects_command(struct batch_options *opt, > + struct strbuf *output, > + struct expand_data *data) > +{ > + struct strbuf input = STRBUF_INIT; > + > + /* Read each line dispatch its command */ > + while (!strbuf_getwholeline(&input, stdin, line_termination)) { > + int i; > + const struct parse_cmd *cmd = NULL; > + const char *p, *cmd_end; > + > + if (*input.buf == line_termination) > + die("empty command in input"); > + else if (isspace(*input.buf)) > + die("whitespace before command: %s", input.buf); > + > + for (i = 0; i < ARRAY_SIZE(commands); i++) { > + const char *prefix = commands[i].prefix; > + char c; > + if (!skip_prefix(input.buf, prefix, &cmd_end)) > + continue; > + /* > + * If the command has arguments, verify that it's > + * followed by a space. Otherwise, it shall be followed > + * by a line terminator. > + */ > + c = commands[i].takes_args ? ' ' : line_termination; > + if (input.buf[strlen(prefix)] != c) > + die("arguments invalid for command: %s", commands[i].prefix); > + > + cmd = &commands[i]; > + if (cmd->takes_args) { > + p = cmd_end + 1; > + // strip newline before handing it to the > + // handling function So above the /* */ comments delimiters are used but here // is used. I am not sure we support // these days, but if we do, I think it would be better to avoid mixing comment styles in the same function. > + input.buf[strcspn(input.buf, "\n")] = '\0'; > + } > + > + break; > + } > + > + if (!cmd) > + die("unknown command: %s", input.buf); > + > + cmd->fn(opt, p, output, data); > + } > + strbuf_release(&input); > +} > @@ -590,6 +682,9 @@ static int batch_objects(struct batch_options *opt) > save_warning = warn_on_object_refname_ambiguity; > warn_on_object_refname_ambiguity = 0; > > + if (command) > + batch_objects_command(opt, &output, &data); > + > while (strbuf_getline(&input, stdin) != EOF) { I think batch_objects_command() will consume everything from stdin, so it doesn't make sense to try to read again from stdin after it. Maybe the whole while (...) { ... } clause should be inside an else clause or something. > if (data.split_on_whitespace) { > /* > @@ -636,6 +731,7 @@ static int batch_option_callback(const struct option *opt, > > bo->enabled = 1; > bo->print_contents = !strcmp(opt->long_name, "batch"); > + bo->command = !strcmp(opt->long_name, "batch-command"); > bo->format = arg; > > return 0;
On Mon, Jan 31 2022, Junio C Hamano wrote: > Bagas Sanjaya <bagasdotme@gmail.com> writes: > >> On 29/01/22 01.33, John Cai wrote: >>> Future improvements: >>> - a non-trivial part of "cat-file --batch" time is spent >>> on parsing its argument and seeing if it's a revision, ref etc. So we >>> could add a command that only accepts a full-length 40 >>> character SHA-1. >> >> I think the full hash is actually revision name. > > There is no entry for "revision name" in Documentation/glossary-content.txt > ;-) > > But to John, if you have a loop that feedseach line to get_oid(), > > while (getline(buf)) { > struct object_id oid; > if (get_oid(buf, &oid)) > warn and continue; > use oid; > } > > is it much slower than a mode that can ONLY handle a full object > name input, i.e. > > while (getline(buf)) { > struct object_id oid; > if (get_oid_hex(buf, &oid)) > warn and continue; > use oid; > } > > when your input is restricted to full object names? > > get_oid() == repo_get_oid() > -> get_oid_with_context() > -> get_oid_with_context_1() > -> get_oid_1() > -> peel_onion() > -> get_oid_basic() > -> get_oid_hex() > -> repo_dwim_ref() > > it seems that warn_ambiguous_refs and warn_on_object_refname_ambiguity > we would waste time on refname discovery but I see cat-file already > has some provision to disable this check. So when we do not need to > call repo_dwim_ref(), do we still spend measurable cycles in this > callchain? For what it's worth I think this claim that we spend a non-trivial amount of time on the difference between these two comes from me originally. I'd had a chat with John about various things to try out in such a "cat-file --batch" mode, and this was one of those things. I tried instrumenting the relevant code in builtin/cat-file.c the other (but forgot to reply to this thread), and whatever I'd found there at the time (this was weeks/months ago) I couldn't reproduce. So there's probably nothing worthwhile to check out here, i.e. the trivial cost of get_oid_with_context() is probably nothing to worry about.
Hi John On 28/01/2022 18:33, John Cai wrote: > This RFC patch proposes a new flag --batch-command that works with > git-cat-file --batch. Similar to git-update-ref --stdin, it will accept > commands and arguments from stdin. > > The start of this idea was discussed in [1], where the original > motivation was to be able to control when the buffer was flushed to > stdout in --buffer mode. > > However, this can actually be much more useful in situations when > git-cat-file --batch is being used as a long lived backend query > process. At GitLab, we use a pair of cat-file processes. One for > iterating over object metadata with --batch-check, and the other to grab > object contents with --batch. However, if we had --batch-command, we could > get rid of the second --batch-check process, and just have one process > where we can flip between getting object info, and getting object contents. > This can lead to huge savings since on a given server there could be hundreds to > thousands of git cat-file processes at a time. > > git cat-file --batch-command > > $ <command> [arg1] [arg2] NL > > This patch adds three commands: object, info, fflush > > $ object <sha1> NL > $ info <sha1> NL > $ fflush NL > > These three would be immediately useful in GitLab's context, but one can > imagine this mode to be further extended for other things. I think this is moving in the right direction, I've left some comments below. > Future improvements: > - a non-trivial part of "cat-file --batch" time is spent > on parsing its argument and seeing if it's a revision, ref etc. So we > could add a command that only accepts a full-length 40 > character SHA-1. > > This would be the first step in adding such an interface to > git-cat-file. > > [1] https://lore.kernel.org/git/pull.1124.git.git.1636149400.gitgitgadget@gmail.com/ > > Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> > Signed-off-by: John Cai <johncai86@gmail.com> > --- > > Taylor, I'd be interested in your thoughts on this proposal since you helped > review the previous patch that turned into this RFC. Thanks! > > Changes from v2: > > - refactored tests to be within run_tests() > - added a test to test --buffer behavior with fflush > - cleaned up cat-file.c: clarified var names, removed unnecessary code > based on suggestions from Phillip Wood > - removed strvec changes > > Changes from v1: > > - changed option name to batch-command. > - changed command function interface to receive the whole line after the command > name to put the onus of parsing arguments to each individual command function. > - pass in whole line to batch_one_object in both parse_cmd_object and > parse_cmd_info to support spaces in the object reference. > - removed addition of -z to include in a separate patch series > - added documentation. > --- > Documentation/git-cat-file.txt | 15 +++++ > builtin/cat-file.c | 114 +++++++++++++++++++++++++++++++-- > t/t1006-cat-file.sh | 83 ++++++++++++++++++++++++ > 3 files changed, 205 insertions(+), 7 deletions(-) > > diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt > index bef76f4dd0..254e546c79 100644 > --- a/Documentation/git-cat-file.txt > +++ b/Documentation/git-cat-file.txt > @@ -96,6 +96,21 @@ OPTIONS > need to specify the path, separated by whitespace. See the > section `BATCH OUTPUT` below for details. > > +--batch-command:: > + Enter a command mode that reads from stdin. May not be combined with any > + other options or arguments except `--textconv` or `--filters`, in which > + case the input lines also need to specify the path, separated by > + whitespace. See the section `BATCH OUTPUT` below for details. > + > +object <object>:: > + Print object contents for object reference <object> > + > +info <object>:: > + Print object info for object reference <object> > + > +flush:: I think this is a better name than fflush as calling fflush is just an implementation detail. I've been thinking about this and have some concerns with the current implementation as it allows the caller to force a flush but the output may be flushed at other times and so it does not allow for deadlock free reading and writing to cat-file from a single process. If instead we buffered the input lines and only processed then when we received a flush command it would be easy to read and write to cat-file from a single process without deadlocks. This would mean that callers would have to add flush commands to get any output before they closed the pipe writing to cat-file. > + Flush to stdout immediately when used with --buffer > + > --batch-all-objects:: > Instead of reading a list of objects on stdin, perform the > requested batch operation on all objects in the repository and > diff --git a/builtin/cat-file.c b/builtin/cat-file.c > index 7b3f42950e..cc9e47943b 100644 > --- a/builtin/cat-file.c > +++ b/builtin/cat-file.c > @@ -24,9 +24,11 @@ struct batch_options { > int buffer_output; > int all_objects; > int unordered; > - int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */ > + int mode; /* may be 'w' or 'c' for --filters or --textconv */ If you really need to rename this variable then doing that in a separate preparatory patch would make this easier to review as it separates out two separate sets of changes. > const char *format; > + int command; Can't we just call this batch_command and leave cmdmode alone? I don't know maybe cmdmode would be confused with having something to do with batch_command then. > }; > +static char line_termination = '\n'; This seems unnecessary as it is the only permitted line ending > static const char *force_path; > > @@ -302,19 +304,19 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d > if (data->type == OBJ_BLOB) { > if (opt->buffer_output) > fflush(stdout); > - if (opt->cmdmode) { > + if (opt->mode) { > char *contents; > unsigned long size; > > if (!data->rest) > die("missing path for '%s'", oid_to_hex(oid)); > > - if (opt->cmdmode == 'w') { > + if (opt->mode == 'w') { > if (filter_object(data->rest, 0100644, oid, > &contents, &size)) > die("could not convert '%s' %s", > oid_to_hex(oid), data->rest); > - } else if (opt->cmdmode == 'c') { > + } else if (opt->mode == 'c') { > enum object_type type; > if (!textconv_object(the_repository, > data->rest, 0100644, oid, > @@ -326,7 +328,7 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d > die("could not convert '%s' %s", > oid_to_hex(oid), data->rest); > } else > - BUG("invalid cmdmode: %c", opt->cmdmode); > + BUG("invalid mode: %c", opt->mode); > batch_write(opt, contents, size); > free(contents); > } else { > @@ -508,6 +510,95 @@ static int batch_unordered_packed(const struct object_id *oid, > data); > } > > +static void parse_cmd_object(struct batch_options *opt, > + const char *line, > + struct strbuf *output, > + struct expand_data *data) > +{ > + opt->print_contents = 1; > + batch_one_object(line, output, opt, data); > +} > + > +static void parse_cmd_info(struct batch_options *opt, > + const char *line, > + struct strbuf *output, > + struct expand_data *data) > +{ > + opt->print_contents = 0; > + batch_one_object(line, output, opt, data); > +} > + > +static void parse_cmd_fflush(struct batch_options *opt, > + const char *line, > + struct strbuf *output, > + struct expand_data *data) > +{ > + fflush(stdout); > +} > + > +typedef void (*parse_cmd_fn_t)(struct batch_options *, const char *, > + struct strbuf *, struct expand_data *); > + > +static const struct parse_cmd { > + const char *prefix; > + parse_cmd_fn_t fn; > + unsigned takes_args; > +} commands[] = { > + { "object", parse_cmd_object, 1}, > + { "info", parse_cmd_info, 1}, > + { "fflush", parse_cmd_fflush, 0}, > +}; > + > +static void batch_objects_command(struct batch_options *opt, > + struct strbuf *output, > + struct expand_data *data) > +{ > + struct strbuf input = STRBUF_INIT; > + > + /* Read each line dispatch its command */ > + while (!strbuf_getwholeline(&input, stdin, line_termination)) { > + int i; > + const struct parse_cmd *cmd = NULL; > + const char *p, *cmd_end; > + > + if (*input.buf == line_termination) > + die("empty command in input"); > + else if (isspace(*input.buf)) > + die("whitespace before command: %s", input.buf); > + > + for (i = 0; i < ARRAY_SIZE(commands); i++) { > + const char *prefix = commands[i].prefix; > + char c; > + if (!skip_prefix(input.buf, prefix, &cmd_end)) > + continue; > + /* > + * If the command has arguments, verify that it's > + * followed by a space. Otherwise, it shall be followed > + * by a line terminator. > + */ > + c = commands[i].takes_args ? ' ' : line_termination; > + if (input.buf[strlen(prefix)] != c) As I pointed out in the last round you can use cmd_end here rather than calling strlen. I've just noticed that this will fail for a command that does not take arguments when the last input line does not end with '\n'. I think if (*cmd_end && *cmd_end != c) would be safer > + die("arguments invalid for command: %s", commands[i].prefix); > + > + cmd = &commands[i]; > + if (cmd->takes_args) { > + p = cmd_end + 1; > + // strip newline before handing it to the > + // handling function > + input.buf[strcspn(input.buf, "\n")] = '\0'; When I suggested replacing strstr() last time I failed to notice that we know that if there is a newline character it is at input.buf[len - 1] so we don't need to scan the whole string to find it. if (input.buf[input.len - 1] == '\n') input.buf[--buf.len] = '\0'; (feel free to change it to avoid the prefix operator) > + } > + > + break; > + } > + > + if (!cmd) > + die("unknown command: %s", input.buf); > + > + cmd->fn(opt, p, output, data); > + } > + strbuf_release(&input); > +} > + > static int batch_objects(struct batch_options *opt) > { > struct strbuf input = STRBUF_INIT; > @@ -515,6 +606,7 @@ static int batch_objects(struct batch_options *opt) > struct expand_data data; > int save_warning; > int retval = 0; > + const int command = opt->command; > > if (!opt->format) > opt->format = "%(objectname) %(objecttype) %(objectsize)"; > @@ -529,7 +621,7 @@ static int batch_objects(struct batch_options *opt) > strbuf_expand(&output, opt->format, expand_format, &data); > data.mark_query = 0; > strbuf_release(&output); > - if (opt->cmdmode) > + if (opt->mode) > data.split_on_whitespace = 1; > > /* > @@ -590,6 +682,9 @@ static int batch_objects(struct batch_options *opt) > save_warning = warn_on_object_refname_ambiguity; > warn_on_object_refname_ambiguity = 0; > > + if (command) > + batch_objects_command(opt, &output, &data); > + Moving this is good as we no longer need to touch the line below > while (strbuf_getline(&input, stdin) != EOF) { > if (data.split_on_whitespace) { > /* > @@ -636,6 +731,7 @@ static int batch_option_callback(const struct option *opt, > > bo->enabled = 1; > bo->print_contents = !strcmp(opt->long_name, "batch"); > + bo->command = !strcmp(opt->long_name, "batch-command"); > bo->format = arg; > > return 0; > @@ -683,6 +779,10 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix) > N_("like --batch, but don't emit <contents>"), > PARSE_OPT_OPTARG | PARSE_OPT_NONEG, > batch_option_callback), > + OPT_CALLBACK_F(0, "batch-command", &batch, N_(""), > + N_("enters batch mode that accepts commands"), > + PARSE_OPT_NOARG | PARSE_OPT_NONEG, > + batch_option_callback), > OPT_CMDMODE(0, "batch-all-objects", &opt, > N_("with --batch[-check]: ignores stdin, batches all known objects"), 'b'), > /* Batch-specific options */ > @@ -742,7 +842,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix) > /* Return early if we're in batch mode? */ > if (batch.enabled) { > if (opt_cw) > - batch.cmdmode = opt; > + batch.mode = opt; > else if (opt && opt != 'b') > usage_msg_optf(_("'-%c' is incompatible with batch mode"), > usage, options, opt); > diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh > index 145eee11df..92f0b14a95 100755 > --- a/t/t1006-cat-file.sh > +++ b/t/t1006-cat-file.sh > @@ -112,6 +112,46 @@ maybe_remove_timestamp () { > fi > } It's really good that you've found a way to test fflush > +run_buffer_test () { > + type=$1 > + sha1=$2 > + size=$3 > + flush=$4 > + > + mkfifo V.input > + exec 8<>V.input > + rm V.input > + > + mkfifo V.output > + exec 9<>V.output > + rm V.output > + > + ( > + git cat-file --buffer --batch-command <&8 >&9 & > + echo $! >&9 && > + wait $! > + echo >&9 EARLY_EXIT > + ) & > + sh_pid=$! > + read fi_pid <&9 > + test_when_finished " > + exec 8>&-; exec 9>&-; > + kill $sh_pid && wait $sh_pid > + kill $fi_pid && wait $fi_pid > + true" > + expect=$(echo "$sha1 $type $size") > + echo "info $sha1" >&8 > + if test "$flush" = "true" > + then > + echo "fflush" >&8 If I change "fflush" to "bad-command" then this test still passes. This is because die() flushes stdout and the read below will get the output it was expecting and not 'EARLY_EXIT'. I have a suggested change below to get around that also checks the exit code of "git cat-file" (This test effectively puts a git command upstream of a pipe which we try to avoid in order to pick up any crashes) > + else > + expect="EARLY_EXIT" > + kill $fi_pid && wait $fi_pid > + fi > + read actual <&9 > + test "$actual" = "$expect" > +} > + > run_tests () { > type=$1 > sha1=$2 > @@ -177,6 +217,18 @@ $content" > test_cmp expect actual > ' Here is my suggestion based on your test - I ended up splitting the fflush and no-fflush cases run_buffer_test_flush () { type=$1 sha1=$2 size=$3 mkfifo input && test_when_finished 'rm input' mkfifo output && exec 9<>output && test_when_finished 'rm output; exec 9<&-' ( # TODO - Ideally we'd pipe the output of cat-file # through "sed s'/$/\\/'" to make sure that that read # would consume all the available # output. Unfortunately we cannot do this as we cannot # control when sed flushes its output. We could write # a test helper in C that appended a '\' to the end of # each line and flushes its output after every line. git cat-file --buffer --batch-command <input 2>err & echo $! && wait $! echo $? ) >&9 & sh_pid=$! && read cat_file_pid <&9 && test_when_finished "kill $cat_file_pid kill $sh_pid; wait $sh_pid; :" && ( test_write_lines "info $sha1" fflush "info $sha1" && # TODO - consume all available input, not just one # line (see above). read actual <&9 && echo "$actual" >actual && echo "$sha1 $type $size" >expect && test_cmp expect actual ) >input && # check output is flushed on exit read actual <&9 && echo "$actual" >actual && test_cmp expect actual && test_must_be_empty err && read status <&9 && test "$status" -eq 0 } run_buffer_test_no_flush () { type=$1 sha1=$2 size=$3 mkfifo input && test_when_finished 'rm input' mkfifo pid && exec 9<>pid && test_when_finished 'rm pid; exec 9<&-' ( git cat-file --buffer --batch-command <input >output & echo $! && wait $! echo $? ) >&9 & sh_pid=$! && read cat_file_pid <&9 && test_when_finished "kill $cat_file_pid kill $sh_pid; wait $sh_pid; :" && ( test_write_lines "info $sha1" "info $sha1" && kill $cat_file_pid && read status <&9 && test "$status" -ne 0 && test_must_be_empty output ) >input } Best Wishes Phillip > + test -z "$content" || > + test_expect_success "--batch-command output of $type content is correct" ' > + maybe_remove_timestamp "$batch_output" $no_ts >expect && > + maybe_remove_timestamp "$(echo object $sha1 | git cat-file --batch-command)" $no_ts >actual && > + test_cmp expect actual > + ' > + > + test_expect_success "batch-command output of $type info is correct" ' > + echo "$sha1 $type $size" >expect && > + echo "info $sha1" | git cat-file --batch-command >actual && > + test_cmp expect actual > +' > test_expect_success "custom --batch-check format" ' > echo "$type $sha1" >expect && > echo $sha1 | git cat-file --batch-check="%(objecttype) %(objectname)" >actual && > @@ -232,12 +284,29 @@ test_expect_success '--batch-check without %(rest) considers whole line' ' > test_cmp expect actual > ' > > +test_expect_success '--batch-command --buffer with flush is correct for blob' ' > + run_buffer_test 'blob' $hello_sha1 $hello_size true > +' > + > +test_expect_success '--batch-command --buffer without flush is correct for blob' ' > + run_buffer_test 'blob' $hello_sha1 $hello_size false > +' > + > tree_sha1=$(git write-tree) > + > tree_size=$(($(test_oid rawsz) + 13)) > tree_pretty_content="100644 blob $hello_sha1 hello" > > run_tests 'tree' $tree_sha1 $tree_size "" "$tree_pretty_content" > > +test_expect_success '--batch-command --buffer with flush is correct for tree' ' > + run_buffer_test 'tree' $tree_sha1 $tree_size true > +' > + > +test_expect_success '--batch-command --buffer without flush is correct for tree' ' > + run_buffer_test 'tree' $tree_sha1 $tree_size false > +' > + > commit_message="Initial commit" > commit_sha1=$(echo_without_newline "$commit_message" | git commit-tree $tree_sha1) > commit_size=$(($(test_oid hexsz) + 137)) > @@ -263,6 +332,14 @@ tag_size=$(strlen "$tag_content") > > run_tests 'tag' $tag_sha1 $tag_size "$tag_content" "$tag_content" 1 > > +test_expect_success '--batch-command --buffer with flush is correct for tag' ' > + run_buffer_test 'tag' $tag_sha1 $tag_size true > +' > + > +test_expect_success '--batch-command --buffer without flush is correct for tag' ' > + run_buffer_test 'tag' $tag_sha1 $tag_size false > +' > + > test_expect_success \ > "Reach a blob from a tag pointing to it" \ > "test '$hello_content' = \"\$(git cat-file blob $tag_sha1)\"" > @@ -964,4 +1041,10 @@ test_expect_success 'cat-file --batch-all-objects --batch-check ignores replace' > test_cmp expect actual > ' > > +test_expect_success 'batch-command unknown command' ' > + echo unknown_command >cmd && > + test_expect_code 128 git cat-file --batch-command < cmd 2>err && > + grep -E "^fatal:.*unknown command.*" err > +' > + > test_done
On Tue, Feb 01, 2022 at 10:39:30AM +0100, Christian Couder wrote: > Also I think at this point this should probably not be an RFC patch > anymore but a regular one. I think that this is due to my first "review" in [1], where I tried to get an understanding of what John's concrete usage plans were, since I couldn't figure out what they were on my own. I'm not sure that I've seen a response along the lines of "we need to control when the output stream is flushed in order to do ..." yet, but I would be interested to see one before moving too much further ahead of where we already are. (Apologies if such a response was written, and I missed it). Thanks, Taylor [1]: https://lore.kernel.org/git/YehomwNiIs0l83W7@nand.local/
Hi Taylor, On 1 Feb 2022, at 12:52, Taylor Blau wrote: > On Tue, Feb 01, 2022 at 10:39:30AM +0100, Christian Couder wrote: >> Also I think at this point this should probably not be an RFC patch >> anymore but a regular one. > > I think that this is due to my first "review" in [1], where I tried to > get an understanding of what John's concrete usage plans were, since I > couldn't figure out what they were on my own. > > I'm not sure that I've seen a response along the lines of "we need to > control when the output stream is flushed in order to do ..." yet, but I > would be interested to see one before moving too much further ahead of > where we already are. This would be useful when there is another process A interacting with a long running git cat-file process B that is retrieving object information from the odb interactively but also wants to use --buffer mode. In this scenario, if A is asked to retrieve a large list of object metadata, it wants to use --buffer mode to be more efficient but it would need a way to ensure that all of the contents have been flushed to the output. If we want to keep B running to save startup time (since otherwise we might need to spawn many git cat-file processes), then having a flush command where we can guarantee a flush would be very handy. > > (Apologies if such a response was written, and I missed it). Nope, don't think I explained the need for a flush command very clearly. thanks > > Thanks, > Taylor > > [1]: https://lore.kernel.org/git/YehomwNiIs0l83W7@nand.local/
On Tue, Feb 01, 2022 at 02:27:54PM -0500, John Cai wrote: > On 1 Feb 2022, at 12:52, Taylor Blau wrote: > > I'm not sure that I've seen a response along the lines of "we need to > > control when the output stream is flushed in order to do ..." yet, but I > > would be interested to see one before moving too much further ahead of > > where we already are. > > This would be useful when there is another process A interacting with > a long running git cat-file process B that is retrieving object > information from the odb interactively but also wants to use --buffer > mode. Let me try and repeat my understanding of what you said to make sure that I fully grok the use-case you have in mind. You have a repository and want to have a long-running `git cat-file` process that can serve multiple requests. Because the processes which interact with your long-running `cat-file` may ask for many objects, you don't want to flush the output buffer after each object, and so would ideally like to use `--buffer`. But that doesn't quite work, since the `cat-file` process may not have decided to flush its output buffer even when process A is about to go away. I wonder about the viability of accomplishing this via a signal handler, i.e., that `cat-file` would call fflush(2) whenever it receives e.g., SIGUSR1. A couple of possible downsides: - SIGUSR1 doesn't exist on Windows AFAIK. - There are definitely going to be synchrony issues to contend with. What happens if we receive our signal while writing to the output stream? I think you would just need to mark a variable that indicates we should flush after finishing serving the current request, but I haven't thought too hard about it. So maybe a signal isn't the way to go. But I don't think `--stdin-cmd` is the simplest approach either. At the very least, I don't totally understand your plan after implementing a flush command. You mention that it would be nice to implement other commands, but I'm not totally convinced by your examples[1]. I wonder if we could strike a middle ground, which might look like `git cat-file --batch --buffer`, and just feeding it something which we know for certain isn't an object identifier. In other words, what if we did something as simple as: --- >8 --- diff --git a/builtin/cat-file.c b/builtin/cat-file.c index d94050e6c1..bae162fc18 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -595,6 +595,11 @@ static int batch_objects(struct batch_options *opt) warn_on_object_refname_ambiguity = 0; while (strbuf_getline(&input, stdin) != EOF) { + if (!strcmp("<flush>", input.buf)) { + fflush(stdout); + continue; + } + if (data.split_on_whitespace) { /* * Split at first whitespace, tying off the beginning --- 8< --- On the other hand, something even hackier than the above is that we flush stdout whenever we get a request to print an object which could not be found. So if you feed a single "\n" to your `cat-file` process, you'll get " missing" on its output, and the buffer will immediately be flushed. I'm not sure that I'd recommend relying on that behavior exactly, but if you're looking for a short-term solution, it might work ;). Thanks, Taylor [1]: One that comes to mind is changing the output format mid-stream. But how often does it really make sense to change the output format? I can understand wanting to flush at the end asking cat-file for a bunch of objects, but I don't see how you would want to change the output format often enough that shaving off Git's negligible startup cost is worthwhile (or couldn't be accomplished by just spawning another cat-file process and using that).
On Tue, Feb 01, 2022 at 08:34:06PM -0500, John Cai wrote: > > So maybe a signal isn't the way to go. But I don't think `--stdin-cmd` > > is the simplest approach either. At the very least, I don't totally > > understand your plan after implementing a flush command. You mention > > that it would be nice to implement other commands, but I'm not totally > > convinced by your examples[1]. > > I agree that if flush were the only use case for a new flag, it might not > be worth it. But, the flush command is only one of the use cases that a > --stdin-cmd (now changed to --batch-command per Phillip's suggestion) > > There are two other commands in this RFC > > object <rev> > info <rev> > > These are described in [1] that are also a motivation for a command > interface. > The description in [1] explains why a command interface would be necessary. This seems like a more realistic and well-motivated proposal, IMHO. I am a little curious that having the ability to switch between asking for an object's contents and its metadata would lead to saving thousands of processes per server. (Apropos of this series, I am curious: how long does GitLab keep a pair of cat-file programs running for each repository?) In any case, I'll take a closer look over the aforementioned version and let you know what my thoughts are, probably tomorrow. > 1. https://lore.kernel.org/git/20220128183319.43496-1-johncai86@gmail.com/ Thanks, Taylor
Hi Christian On 1 Feb 2022, at 4:39, Christian Couder wrote: > On Mon, Jan 31, 2022 at 9:34 PM John Cai <johncai86@gmail.com> wrote: >> >> This RFC patch proposes a new flag --batch-command that works with >> git-cat-file --batch. > > The subject is "Re: [RFC v3] cat-file: add a --stdin-cmd mode" and now > you are talking about '--batch-command' instead of '--stdin-cmd'. > > "that works with git-cat-file --batch" is not very clear. Maybe you > could find a wording that explains better how --batch-command is > different from --batch. > > Also I think at this point this should probably not be an RFC patch > anymore but a regular one. > >> Similar to git-update-ref --stdin, it will accept >> commands and arguments from stdin. >> >> The start of this idea was discussed in [1], where the original >> motivation was to be able to control when the buffer was flushed to >> stdout in --buffer mode. > > That would be nice in a cover letter but I am not sure a commit > message is the right place for this. > >> However, this can actually be much more useful in situations when >> git-cat-file --batch is being used as a long lived backend query >> process. At GitLab, we use a pair of cat-file processes. One for >> iterating over object metadata with --batch-check, and the other to grab >> object contents with --batch. However, if we had --batch-command, we could >> get rid of the second --batch-check process, > > Maybe s/second// would make it clear that there are no two > --batch-command processes. > >> and just have one process >> where we can flip between getting object info, and getting object contents. >> This can lead to huge savings since on a given server there could be hundreds to >> thousands of git cat-file processes at a time. > > It's not clear if all the git cat-file processes you are talking about > are mostly --batch-check processes or --batch processes, or a roughly > equal amount of both. My guess is the latter and that --batch-command > would mean that there would be around two times fewer cat-file > processes. > >> git cat-file --batch-command >> >> $ <command> [arg1] [arg2] NL > > It's a bit unclear what the 2 above lines mean. Maybe you could add a > small explanation like for example "The new flag can be used like > this:" and "It receives commands from stdin in the format:" > > Also not sure why there is a '$' char in front of '<command> [arg1] > [arg2] NL' but not in front of 'git cat-file --batch-command'. It > doesn't look like in the 'git update-ref --stdin' doc that '$' are > used in front of the commands that can be passed through stdin. > >> This patch adds three commands: object, info, fflush > > Maybe s/three commands/the following first three commands/ > >> $ object <sha1> NL >> $ info <sha1> NL >> $ fflush NL > > Idem about '$'. > >> These three would be immediately useful in GitLab's context, but one can >> imagine this mode to be further extended for other things. > > Not very clear which "mode" you are talking about. You have been > talking about a mode only in the subject so far. Maybe you should talk > a bit about that above when '<command> [arg1] [arg2] NL' is > introduced. > > Also you don't talk about the output format. --batch and --batch-check > accept [=<format>], but it looks like --batch-command doesn't. > >> Future improvements: >> - a non-trivial part of "cat-file --batch" time is spent >> on parsing its argument and seeing if it's a revision, ref etc. So we >> could add a command that only accepts a full-length 40 >> character SHA-1. > > In a cover letter that would be ok, but I am not sure that a commit > message is the best place for this kind of details about future work. > >> This would be the first step in adding such an interface to >> git-cat-file. >> >> [1] https://lore.kernel.org/git/pull.1124.git.git.1636149400.gitgitgadget@gmail.com/ >> >> Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> >> Signed-off-by: John Cai <johncai86@gmail.com> >> --- >> >> Taylor, I'd be interested in your thoughts on this proposal since you helped >> review the previous patch that turned into this RFC. Thanks! >> >> Changes from v2: >> >> - refactored tests to be within run_tests() >> - added a test to test --buffer behavior with fflush >> - cleaned up cat-file.c: clarified var names, removed unnecessary code >> based on suggestions from Phillip Wood >> - removed strvec changes >> >> Changes from v1: >> >> - changed option name to batch-command. >> - changed command function interface to receive the whole line after the command >> name to put the onus of parsing arguments to each individual command function. >> - pass in whole line to batch_one_object in both parse_cmd_object and >> parse_cmd_info to support spaces in the object reference. >> - removed addition of -z to include in a separate patch series >> - added documentation. >> --- >> Documentation/git-cat-file.txt | 15 +++++ >> builtin/cat-file.c | 114 +++++++++++++++++++++++++++++++-- >> t/t1006-cat-file.sh | 83 ++++++++++++++++++++++++ >> 3 files changed, 205 insertions(+), 7 deletions(-) >> >> diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt >> index bef76f4dd0..254e546c79 100644 >> --- a/Documentation/git-cat-file.txt >> +++ b/Documentation/git-cat-file.txt >> @@ -96,6 +96,21 @@ OPTIONS >> need to specify the path, separated by whitespace. See the >> section `BATCH OUTPUT` below for details. >> >> +--batch-command:: >> + Enter a command mode that reads from stdin. > > Maybe s/a command mode that reads from stdin/a mode that reads > commands from stdin/ > > Also I would expect something about the output, like perhaps "...and > ouputs the command results to stdout". > >> May not be combined with any >> + other options or arguments except `--textconv` or `--filters`, in which >> + case the input lines also need to specify the path, separated by >> + whitespace. See the section `BATCH OUTPUT` below for details. > > The BATCH OUTPUT section says that a format can be passed but that > doesn't seem to be the case with --batch-command. So you might need to > make some changes to that section too or add a bit more details about > the output here. Thinking about this more, I wonder if it'd be worth it to allow the --batch-command=<format> for backwards compatibility reasons for users who switch over to using --batch-command from --batch and --batch-check. The only thing that makes me hesitant is that the <format> would only be relevant to the "info" and "object" commands instead of being relevant to all commands in --batch-command mode. > >> +object <object>:: >> + Print object contents for object reference <object> >> + >> +info <object>:: >> + Print object info for object reference <object> >> + >> +flush:: >> + Flush to stdout immediately when used with --buffer >> + >> --batch-all-objects:: >> Instead of reading a list of objects on stdin, perform the >> requested batch operation on all objects in the repository and >> diff --git a/builtin/cat-file.c b/builtin/cat-file.c >> index 7b3f42950e..cc9e47943b 100644 >> --- a/builtin/cat-file.c >> +++ b/builtin/cat-file.c >> @@ -24,9 +24,11 @@ struct batch_options { >> int buffer_output; >> int all_objects; >> int unordered; >> - int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */ >> + int mode; /* may be 'w' or 'c' for --filters or --textconv */ >> const char *format; >> + int command; >> }; > > Maybe add a blank line here. > >> +static char line_termination = '\n'; >> >> static const char *force_path; >> >> @@ -302,19 +304,19 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d >> if (data->type == OBJ_BLOB) { >> if (opt->buffer_output) >> fflush(stdout); >> - if (opt->cmdmode) { >> + if (opt->mode) { > > The mechanical s/cmdmode/mode/g change could have been made in a > preparatory patch to make this patch a bit smaller and easier to > digest. > >> +static void batch_objects_command(struct batch_options *opt, >> + struct strbuf *output, >> + struct expand_data *data) >> +{ >> + struct strbuf input = STRBUF_INIT; >> + >> + /* Read each line dispatch its command */ >> + while (!strbuf_getwholeline(&input, stdin, line_termination)) { >> + int i; >> + const struct parse_cmd *cmd = NULL; >> + const char *p, *cmd_end; >> + >> + if (*input.buf == line_termination) >> + die("empty command in input"); >> + else if (isspace(*input.buf)) >> + die("whitespace before command: %s", input.buf); >> + >> + for (i = 0; i < ARRAY_SIZE(commands); i++) { >> + const char *prefix = commands[i].prefix; >> + char c; >> + if (!skip_prefix(input.buf, prefix, &cmd_end)) >> + continue; >> + /* >> + * If the command has arguments, verify that it's >> + * followed by a space. Otherwise, it shall be followed >> + * by a line terminator. >> + */ >> + c = commands[i].takes_args ? ' ' : line_termination; >> + if (input.buf[strlen(prefix)] != c) >> + die("arguments invalid for command: %s", commands[i].prefix); >> + >> + cmd = &commands[i]; >> + if (cmd->takes_args) { >> + p = cmd_end + 1; >> + // strip newline before handing it to the >> + // handling function > > So above the /* */ comments delimiters are used but here // is used. I > am not sure we support // these days, but if we do, I think it would > be better to avoid mixing comment styles in the same function. > >> + input.buf[strcspn(input.buf, "\n")] = '\0'; >> + } >> + >> + break; >> + } >> + >> + if (!cmd) >> + die("unknown command: %s", input.buf); >> + >> + cmd->fn(opt, p, output, data); >> + } >> + strbuf_release(&input); >> +} > >> @@ -590,6 +682,9 @@ static int batch_objects(struct batch_options *opt) >> save_warning = warn_on_object_refname_ambiguity; >> warn_on_object_refname_ambiguity = 0; >> >> + if (command) >> + batch_objects_command(opt, &output, &data); >> + >> while (strbuf_getline(&input, stdin) != EOF) { > > I think batch_objects_command() will consume everything from stdin, so > it doesn't make sense to try to read again from stdin after it. Maybe > the whole while (...) { ... } clause should be inside an else clause > or something. > >> if (data.split_on_whitespace) { >> /* >> @@ -636,6 +731,7 @@ static int batch_option_callback(const struct option *opt, >> >> bo->enabled = 1; >> bo->print_contents = !strcmp(opt->long_name, "batch"); >> + bo->command = !strcmp(opt->long_name, "batch-command"); >> bo->format = arg; >> >> return 0;
Hi Phillip On 1 Feb 2022, at 5:43, Phillip Wood wrote: > Hi John > > On 28/01/2022 18:33, John Cai wrote: >> This RFC patch proposes a new flag --batch-command that works with >> git-cat-file --batch. Similar to git-update-ref --stdin, it will accept >> commands and arguments from stdin. >> >> The start of this idea was discussed in [1], where the original >> motivation was to be able to control when the buffer was flushed to >> stdout in --buffer mode. >> >> However, this can actually be much more useful in situations when >> git-cat-file --batch is being used as a long lived backend query >> process. At GitLab, we use a pair of cat-file processes. One for >> iterating over object metadata with --batch-check, and the other to grab >> object contents with --batch. However, if we had --batch-command, we could >> get rid of the second --batch-check process, and just have one process >> where we can flip between getting object info, and getting object contents. >> This can lead to huge savings since on a given server there could be hundreds to >> thousands of git cat-file processes at a time. >> >> git cat-file --batch-command >> >> $ <command> [arg1] [arg2] NL >> >> This patch adds three commands: object, info, fflush >> >> $ object <sha1> NL >> $ info <sha1> NL >> $ fflush NL >> >> These three would be immediately useful in GitLab's context, but one can >> imagine this mode to be further extended for other things. > > I think this is moving in the right direction, I've left some comments below. > >> Future improvements: >> - a non-trivial part of "cat-file --batch" time is spent >> on parsing its argument and seeing if it's a revision, ref etc. So we >> could add a command that only accepts a full-length 40 >> character SHA-1. >> >> This would be the first step in adding such an interface to >> git-cat-file. >> >> [1] https://lore.kernel.org/git/pull.1124.git.git.1636149400.gitgitgadget@gmail.com/ >> >> Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> >> Signed-off-by: John Cai <johncai86@gmail.com> >> --- >> >> Taylor, I'd be interested in your thoughts on this proposal since you helped >> review the previous patch that turned into this RFC. Thanks! >> >> Changes from v2: >> >> - refactored tests to be within run_tests() >> - added a test to test --buffer behavior with fflush >> - cleaned up cat-file.c: clarified var names, removed unnecessary code >> based on suggestions from Phillip Wood >> - removed strvec changes >> >> Changes from v1: >> >> - changed option name to batch-command. >> - changed command function interface to receive the whole line after the command >> name to put the onus of parsing arguments to each individual command function. >> - pass in whole line to batch_one_object in both parse_cmd_object and >> parse_cmd_info to support spaces in the object reference. >> - removed addition of -z to include in a separate patch series >> - added documentation. >> --- >> Documentation/git-cat-file.txt | 15 +++++ >> builtin/cat-file.c | 114 +++++++++++++++++++++++++++++++-- >> t/t1006-cat-file.sh | 83 ++++++++++++++++++++++++ >> 3 files changed, 205 insertions(+), 7 deletions(-) >> >> diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt >> index bef76f4dd0..254e546c79 100644 >> --- a/Documentation/git-cat-file.txt >> +++ b/Documentation/git-cat-file.txt >> @@ -96,6 +96,21 @@ OPTIONS >> need to specify the path, separated by whitespace. See the >> section `BATCH OUTPUT` below for details. >> +--batch-command:: >> + Enter a command mode that reads from stdin. May not be combined with any >> + other options or arguments except `--textconv` or `--filters`, in which >> + case the input lines also need to specify the path, separated by >> + whitespace. See the section `BATCH OUTPUT` below for details. >> + >> +object <object>:: >> + Print object contents for object reference <object> >> + >> +info <object>:: >> + Print object info for object reference <object> >> + >> +flush:: > > I think this is a better name than fflush as calling fflush is just an implementation detail. I've been thinking about this and have some concerns with the current implementation as it allows the caller to force a flush but the output may be flushed at other times and so it does not allow for deadlock free reading and writing to cat-file from a single process. If instead we buffered the input lines and only processed then when we received a flush command it would be easy to read and write to cat-file from a single process without deadlocks. This would mean that callers would have to add flush commands to get any output before they closed the pipe writing to cat-file. To address this concern, I wonder if instead of a flush command what we could do is to have --batch-command support "read transactions" similar to update-ref. eg: git cat-file --batch-command begin info <sha1> <sha1> <sha1> <sha1> commit We could queue up these references, and then get them all once "commit" is entered and call fflush(). > >> + Flush to stdout immediately when used with --buffer >> + >> --batch-all-objects:: >> Instead of reading a list of objects on stdin, perform the >> requested batch operation on all objects in the repository and >> diff --git a/builtin/cat-file.c b/builtin/cat-file.c >> index 7b3f42950e..cc9e47943b 100644 >> --- a/builtin/cat-file.c >> +++ b/builtin/cat-file.c >> @@ -24,9 +24,11 @@ struct batch_options { >> int buffer_output; >> int all_objects; >> int unordered; >> - int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */ >> + int mode; /* may be 'w' or 'c' for --filters or --textconv */ > > If you really need to rename this variable then doing that in a separate preparatory patch would make this easier to review as it separates out two separate sets of changes. > >> const char *format; >> + int command; > > Can't we just call this batch_command and leave cmdmode alone? I don't know maybe cmdmode would be confused with having something to do with batch_command then. > >> }; >> +static char line_termination = '\n'; > > This seems unnecessary as it is the only permitted line ending > >> static const char *force_path; >> @@ -302,19 +304,19 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d >> if (data->type == OBJ_BLOB) { >> if (opt->buffer_output) >> fflush(stdout); >> - if (opt->cmdmode) { >> + if (opt->mode) { >> char *contents; >> unsigned long size; >> if (!data->rest) >> die("missing path for '%s'", oid_to_hex(oid)); >> - if (opt->cmdmode == 'w') { >> + if (opt->mode == 'w') { >> if (filter_object(data->rest, 0100644, oid, >> &contents, &size)) >> die("could not convert '%s' %s", >> oid_to_hex(oid), data->rest); >> - } else if (opt->cmdmode == 'c') { >> + } else if (opt->mode == 'c') { >> enum object_type type; >> if (!textconv_object(the_repository, >> data->rest, 0100644, oid, >> @@ -326,7 +328,7 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d >> die("could not convert '%s' %s", >> oid_to_hex(oid), data->rest); >> } else >> - BUG("invalid cmdmode: %c", opt->cmdmode); >> + BUG("invalid mode: %c", opt->mode); >> batch_write(opt, contents, size); >> free(contents); >> } else { >> @@ -508,6 +510,95 @@ static int batch_unordered_packed(const struct object_id *oid, >> data); >> } >> +static void parse_cmd_object(struct batch_options *opt, >> + const char *line, >> + struct strbuf *output, >> + struct expand_data *data) >> +{ >> + opt->print_contents = 1; >> + batch_one_object(line, output, opt, data); >> +} >> + >> +static void parse_cmd_info(struct batch_options *opt, >> + const char *line, >> + struct strbuf *output, >> + struct expand_data *data) >> +{ >> + opt->print_contents = 0; >> + batch_one_object(line, output, opt, data); >> +} >> + >> +static void parse_cmd_fflush(struct batch_options *opt, >> + const char *line, >> + struct strbuf *output, >> + struct expand_data *data) >> +{ >> + fflush(stdout); >> +} >> + >> +typedef void (*parse_cmd_fn_t)(struct batch_options *, const char *, >> + struct strbuf *, struct expand_data *); >> + >> +static const struct parse_cmd { >> + const char *prefix; >> + parse_cmd_fn_t fn; >> + unsigned takes_args; >> +} commands[] = { >> + { "object", parse_cmd_object, 1}, >> + { "info", parse_cmd_info, 1}, >> + { "fflush", parse_cmd_fflush, 0}, >> +}; >> + >> +static void batch_objects_command(struct batch_options *opt, >> + struct strbuf *output, >> + struct expand_data *data) >> +{ >> + struct strbuf input = STRBUF_INIT; >> + >> + /* Read each line dispatch its command */ >> + while (!strbuf_getwholeline(&input, stdin, line_termination)) { >> + int i; >> + const struct parse_cmd *cmd = NULL; >> + const char *p, *cmd_end; >> + >> + if (*input.buf == line_termination) >> + die("empty command in input"); >> + else if (isspace(*input.buf)) >> + die("whitespace before command: %s", input.buf); >> + >> + for (i = 0; i < ARRAY_SIZE(commands); i++) { >> + const char *prefix = commands[i].prefix; >> + char c; >> + if (!skip_prefix(input.buf, prefix, &cmd_end)) >> + continue; >> + /* >> + * If the command has arguments, verify that it's >> + * followed by a space. Otherwise, it shall be followed >> + * by a line terminator. >> + */ >> + c = commands[i].takes_args ? ' ' : line_termination; >> + if (input.buf[strlen(prefix)] != c) > > As I pointed out in the last round you can use cmd_end here rather than calling strlen. I've just noticed that this will fail for a command that does not take arguments when the last input line does not end with '\n'. I think > if (*cmd_end && *cmd_end != c) > would be safer > >> + die("arguments invalid for command: %s", commands[i].prefix); >> + >> + cmd = &commands[i]; >> + if (cmd->takes_args) { >> + p = cmd_end + 1; >> + // strip newline before handing it to the >> + // handling function >> + input.buf[strcspn(input.buf, "\n")] = '\0'; > > When I suggested replacing strstr() last time I failed to notice that we know that if there is a newline character it is at input.buf[len - 1] so we don't need to scan the whole string to find it. > > if (input.buf[input.len - 1] == '\n') > input.buf[--buf.len] = '\0'; > > (feel free to change it to avoid the prefix operator) > >> + } >> + >> + break; >> + } >> + >> + if (!cmd) >> + die("unknown command: %s", input.buf); >> + >> + cmd->fn(opt, p, output, data); >> + } >> + strbuf_release(&input); >> +} >> + >> static int batch_objects(struct batch_options *opt) >> { >> struct strbuf input = STRBUF_INIT; >> @@ -515,6 +606,7 @@ static int batch_objects(struct batch_options *opt) >> struct expand_data data; >> int save_warning; >> int retval = 0; >> + const int command = opt->command; >> if (!opt->format) >> opt->format = "%(objectname) %(objecttype) %(objectsize)"; >> @@ -529,7 +621,7 @@ static int batch_objects(struct batch_options *opt) >> strbuf_expand(&output, opt->format, expand_format, &data); >> data.mark_query = 0; >> strbuf_release(&output); >> - if (opt->cmdmode) >> + if (opt->mode) >> data.split_on_whitespace = 1; >> /* >> @@ -590,6 +682,9 @@ static int batch_objects(struct batch_options *opt) >> save_warning = warn_on_object_refname_ambiguity; >> warn_on_object_refname_ambiguity = 0; >> + if (command) >> + batch_objects_command(opt, &output, &data); >> + > > Moving this is good as we no longer need to touch the line below > >> while (strbuf_getline(&input, stdin) != EOF) { >> if (data.split_on_whitespace) { >> /* >> @@ -636,6 +731,7 @@ static int batch_option_callback(const struct option *opt, >> bo->enabled = 1; >> bo->print_contents = !strcmp(opt->long_name, "batch"); >> + bo->command = !strcmp(opt->long_name, "batch-command"); >> bo->format = arg; >> return 0; >> @@ -683,6 +779,10 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix) >> N_("like --batch, but don't emit <contents>"), >> PARSE_OPT_OPTARG | PARSE_OPT_NONEG, >> batch_option_callback), >> + OPT_CALLBACK_F(0, "batch-command", &batch, N_(""), >> + N_("enters batch mode that accepts commands"), >> + PARSE_OPT_NOARG | PARSE_OPT_NONEG, >> + batch_option_callback), >> OPT_CMDMODE(0, "batch-all-objects", &opt, >> N_("with --batch[-check]: ignores stdin, batches all known objects"), 'b'), >> /* Batch-specific options */ >> @@ -742,7 +842,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix) >> /* Return early if we're in batch mode? */ >> if (batch.enabled) { >> if (opt_cw) >> - batch.cmdmode = opt; >> + batch.mode = opt; >> else if (opt && opt != 'b') >> usage_msg_optf(_("'-%c' is incompatible with batch mode"), >> usage, options, opt); >> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh >> index 145eee11df..92f0b14a95 100755 >> --- a/t/t1006-cat-file.sh >> +++ b/t/t1006-cat-file.sh >> @@ -112,6 +112,46 @@ maybe_remove_timestamp () { >> fi >> } > > It's really good that you've found a way to test fflush > >> +run_buffer_test () { >> + type=$1 >> + sha1=$2 >> + size=$3 >> + flush=$4 >> + >> + mkfifo V.input >> + exec 8<>V.input >> + rm V.input >> + >> + mkfifo V.output >> + exec 9<>V.output >> + rm V.output >> + >> + ( >> + git cat-file --buffer --batch-command <&8 >&9 & >> + echo $! >&9 && >> + wait $! >> + echo >&9 EARLY_EXIT >> + ) & >> + sh_pid=$! >> + read fi_pid <&9 > + test_when_finished " >> + exec 8>&-; exec 9>&-; >> + kill $sh_pid && wait $sh_pid >> + kill $fi_pid && wait $fi_pid >> + true" >> + expect=$(echo "$sha1 $type $size") >> + echo "info $sha1" >&8 >> + if test "$flush" = "true" >> + then >> + echo "fflush" >&8 > > If I change "fflush" to "bad-command" then this test still passes. This is because die() flushes stdout and the read below will get the output it was expecting and not 'EARLY_EXIT'. I have a suggested change below to get around that also checks the exit code of "git cat-file" (This test effectively puts a git command upstream of a pipe which we try to avoid in order to pick up any crashes) > >> + else >> + expect="EARLY_EXIT" >> + kill $fi_pid && wait $fi_pid >> + fi >> + read actual <&9 >> + test "$actual" = "$expect" >> +} >> + >> run_tests () { >> type=$1 >> sha1=$2 >> @@ -177,6 +217,18 @@ $content" >> test_cmp expect actual >> ' > > Here is my suggestion based on your test - I ended up splitting the fflush and no-fflush cases > > run_buffer_test_flush () { > type=$1 > sha1=$2 > size=$3 > > mkfifo input && > test_when_finished 'rm input' > mkfifo output && > exec 9<>output && > test_when_finished 'rm output; exec 9<&-' > ( > # TODO - Ideally we'd pipe the output of cat-file > # through "sed s'/$/\\/'" to make sure that that read > # would consume all the available > # output. Unfortunately we cannot do this as we cannot > # control when sed flushes its output. We could write > # a test helper in C that appended a '\' to the end of > # each line and flushes its output after every line. > git cat-file --buffer --batch-command <input 2>err & > echo $! && > wait $! > echo $? > ) >&9 & > sh_pid=$! && > read cat_file_pid <&9 && > test_when_finished "kill $cat_file_pid > kill $sh_pid; wait $sh_pid; :" && > ( > test_write_lines "info $sha1" fflush "info $sha1" && > # TODO - consume all available input, not just one > # line (see above). > read actual <&9 && > echo "$actual" >actual && > echo "$sha1 $type $size" >expect && > test_cmp expect actual > ) >input && > # check output is flushed on exit > read actual <&9 && > echo "$actual" >actual && > test_cmp expect actual && > test_must_be_empty err && > read status <&9 && > test "$status" -eq 0 > } > > run_buffer_test_no_flush () { > type=$1 > sha1=$2 > size=$3 > > mkfifo input && > test_when_finished 'rm input' > mkfifo pid && > exec 9<>pid && > test_when_finished 'rm pid; exec 9<&-' > ( > git cat-file --buffer --batch-command <input >output & > echo $! && > wait $! > echo $? > ) >&9 & > sh_pid=$! && > read cat_file_pid <&9 && > test_when_finished "kill $cat_file_pid > kill $sh_pid; wait $sh_pid; :" && > ( > test_write_lines "info $sha1" "info $sha1" && > kill $cat_file_pid && > read status <&9 && > test "$status" -ne 0 && > test_must_be_empty output > ) >input > } > > Best Wishes > > Phillip > >> + test -z "$content" || >> + test_expect_success "--batch-command output of $type content is correct" ' >> + maybe_remove_timestamp "$batch_output" $no_ts >expect && >> + maybe_remove_timestamp "$(echo object $sha1 | git cat-file --batch-command)" $no_ts >actual && >> + test_cmp expect actual >> + ' >> + >> + test_expect_success "batch-command output of $type info is correct" ' >> + echo "$sha1 $type $size" >expect && >> + echo "info $sha1" | git cat-file --batch-command >actual && >> + test_cmp expect actual >> +' >> test_expect_success "custom --batch-check format" ' >> echo "$type $sha1" >expect && >> echo $sha1 | git cat-file --batch-check="%(objecttype) %(objectname)" >actual && >> @@ -232,12 +284,29 @@ test_expect_success '--batch-check without %(rest) considers whole line' ' >> test_cmp expect actual >> ' >> +test_expect_success '--batch-command --buffer with flush is correct for blob' ' >> + run_buffer_test 'blob' $hello_sha1 $hello_size true >> +' >> + >> +test_expect_success '--batch-command --buffer without flush is correct for blob' ' >> + run_buffer_test 'blob' $hello_sha1 $hello_size false >> +' >> + >> tree_sha1=$(git write-tree) >> + >> tree_size=$(($(test_oid rawsz) + 13)) >> tree_pretty_content="100644 blob $hello_sha1 hello" >> run_tests 'tree' $tree_sha1 $tree_size "" "$tree_pretty_content" >> +test_expect_success '--batch-command --buffer with flush is correct for tree' ' >> + run_buffer_test 'tree' $tree_sha1 $tree_size true >> +' >> + >> +test_expect_success '--batch-command --buffer without flush is correct for tree' ' >> + run_buffer_test 'tree' $tree_sha1 $tree_size false >> +' >> + >> commit_message="Initial commit" >> commit_sha1=$(echo_without_newline "$commit_message" | git commit-tree $tree_sha1) >> commit_size=$(($(test_oid hexsz) + 137)) >> @@ -263,6 +332,14 @@ tag_size=$(strlen "$tag_content") >> run_tests 'tag' $tag_sha1 $tag_size "$tag_content" "$tag_content" 1 >> +test_expect_success '--batch-command --buffer with flush is correct for tag' ' >> + run_buffer_test 'tag' $tag_sha1 $tag_size true >> +' >> + >> +test_expect_success '--batch-command --buffer without flush is correct for tag' ' >> + run_buffer_test 'tag' $tag_sha1 $tag_size false >> +' >> + >> test_expect_success \ >> "Reach a blob from a tag pointing to it" \ >> "test '$hello_content' = \"\$(git cat-file blob $tag_sha1)\"" >> @@ -964,4 +1041,10 @@ test_expect_success 'cat-file --batch-all-objects --batch-check ignores replace' >> test_cmp expect actual >> ' >> +test_expect_success 'batch-command unknown command' ' >> + echo unknown_command >cmd && >> + test_expect_code 128 git cat-file --batch-command < cmd 2>err && >> + grep -E "^fatal:.*unknown command.*" err >> +' >> + >> test_done
diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt index bef76f4dd0..254e546c79 100644 --- a/Documentation/git-cat-file.txt +++ b/Documentation/git-cat-file.txt @@ -96,6 +96,21 @@ OPTIONS need to specify the path, separated by whitespace. See the section `BATCH OUTPUT` below for details. +--batch-command:: + Enter a command mode that reads from stdin. May not be combined with any + other options or arguments except `--textconv` or `--filters`, in which + case the input lines also need to specify the path, separated by + whitespace. See the section `BATCH OUTPUT` below for details. + +object <object>:: + Print object contents for object reference <object> + +info <object>:: + Print object info for object reference <object> + +flush:: + Flush to stdout immediately when used with --buffer + --batch-all-objects:: Instead of reading a list of objects on stdin, perform the requested batch operation on all objects in the repository and diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 7b3f42950e..cc9e47943b 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -24,9 +24,11 @@ struct batch_options { int buffer_output; int all_objects; int unordered; - int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */ + int mode; /* may be 'w' or 'c' for --filters or --textconv */ const char *format; + int command; }; +static char line_termination = '\n'; static const char *force_path; @@ -302,19 +304,19 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d if (data->type == OBJ_BLOB) { if (opt->buffer_output) fflush(stdout); - if (opt->cmdmode) { + if (opt->mode) { char *contents; unsigned long size; if (!data->rest) die("missing path for '%s'", oid_to_hex(oid)); - if (opt->cmdmode == 'w') { + if (opt->mode == 'w') { if (filter_object(data->rest, 0100644, oid, &contents, &size)) die("could not convert '%s' %s", oid_to_hex(oid), data->rest); - } else if (opt->cmdmode == 'c') { + } else if (opt->mode == 'c') { enum object_type type; if (!textconv_object(the_repository, data->rest, 0100644, oid, @@ -326,7 +328,7 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d die("could not convert '%s' %s", oid_to_hex(oid), data->rest); } else - BUG("invalid cmdmode: %c", opt->cmdmode); + BUG("invalid mode: %c", opt->mode); batch_write(opt, contents, size); free(contents); } else { @@ -508,6 +510,95 @@ static int batch_unordered_packed(const struct object_id *oid, data); } +static void parse_cmd_object(struct batch_options *opt, + const char *line, + struct strbuf *output, + struct expand_data *data) +{ + opt->print_contents = 1; + batch_one_object(line, output, opt, data); +} + +static void parse_cmd_info(struct batch_options *opt, + const char *line, + struct strbuf *output, + struct expand_data *data) +{ + opt->print_contents = 0; + batch_one_object(line, output, opt, data); +} + +static void parse_cmd_fflush(struct batch_options *opt, + const char *line, + struct strbuf *output, + struct expand_data *data) +{ + fflush(stdout); +} + +typedef void (*parse_cmd_fn_t)(struct batch_options *, const char *, + struct strbuf *, struct expand_data *); + +static const struct parse_cmd { + const char *prefix; + parse_cmd_fn_t fn; + unsigned takes_args; +} commands[] = { + { "object", parse_cmd_object, 1}, + { "info", parse_cmd_info, 1}, + { "fflush", parse_cmd_fflush, 0}, +}; + +static void batch_objects_command(struct batch_options *opt, + struct strbuf *output, + struct expand_data *data) +{ + struct strbuf input = STRBUF_INIT; + + /* Read each line dispatch its command */ + while (!strbuf_getwholeline(&input, stdin, line_termination)) { + int i; + const struct parse_cmd *cmd = NULL; + const char *p, *cmd_end; + + if (*input.buf == line_termination) + die("empty command in input"); + else if (isspace(*input.buf)) + die("whitespace before command: %s", input.buf); + + for (i = 0; i < ARRAY_SIZE(commands); i++) { + const char *prefix = commands[i].prefix; + char c; + if (!skip_prefix(input.buf, prefix, &cmd_end)) + continue; + /* + * If the command has arguments, verify that it's + * followed by a space. Otherwise, it shall be followed + * by a line terminator. + */ + c = commands[i].takes_args ? ' ' : line_termination; + if (input.buf[strlen(prefix)] != c) + die("arguments invalid for command: %s", commands[i].prefix); + + cmd = &commands[i]; + if (cmd->takes_args) { + p = cmd_end + 1; + // strip newline before handing it to the + // handling function + input.buf[strcspn(input.buf, "\n")] = '\0'; + } + + break; + } + + if (!cmd) + die("unknown command: %s", input.buf); + + cmd->fn(opt, p, output, data); + } + strbuf_release(&input); +} + static int batch_objects(struct batch_options *opt) { struct strbuf input = STRBUF_INIT; @@ -515,6 +606,7 @@ static int batch_objects(struct batch_options *opt) struct expand_data data; int save_warning; int retval = 0; + const int command = opt->command; if (!opt->format) opt->format = "%(objectname) %(objecttype) %(objectsize)"; @@ -529,7 +621,7 @@ static int batch_objects(struct batch_options *opt) strbuf_expand(&output, opt->format, expand_format, &data); data.mark_query = 0; strbuf_release(&output); - if (opt->cmdmode) + if (opt->mode) data.split_on_whitespace = 1; /* @@ -590,6 +682,9 @@ static int batch_objects(struct batch_options *opt) save_warning = warn_on_object_refname_ambiguity; warn_on_object_refname_ambiguity = 0; + if (command) + batch_objects_command(opt, &output, &data); + while (strbuf_getline(&input, stdin) != EOF) { if (data.split_on_whitespace) { /* @@ -636,6 +731,7 @@ static int batch_option_callback(const struct option *opt, bo->enabled = 1; bo->print_contents = !strcmp(opt->long_name, "batch"); + bo->command = !strcmp(opt->long_name, "batch-command"); bo->format = arg; return 0; @@ -683,6 +779,10 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix) N_("like --batch, but don't emit <contents>"), PARSE_OPT_OPTARG | PARSE_OPT_NONEG, batch_option_callback), + OPT_CALLBACK_F(0, "batch-command", &batch, N_(""), + N_("enters batch mode that accepts commands"), + PARSE_OPT_NOARG | PARSE_OPT_NONEG, + batch_option_callback), OPT_CMDMODE(0, "batch-all-objects", &opt, N_("with --batch[-check]: ignores stdin, batches all known objects"), 'b'), /* Batch-specific options */ @@ -742,7 +842,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix) /* Return early if we're in batch mode? */ if (batch.enabled) { if (opt_cw) - batch.cmdmode = opt; + batch.mode = opt; else if (opt && opt != 'b') usage_msg_optf(_("'-%c' is incompatible with batch mode"), usage, options, opt); diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh index 145eee11df..92f0b14a95 100755 --- a/t/t1006-cat-file.sh +++ b/t/t1006-cat-file.sh @@ -112,6 +112,46 @@ maybe_remove_timestamp () { fi } +run_buffer_test () { + type=$1 + sha1=$2 + size=$3 + flush=$4 + + mkfifo V.input + exec 8<>V.input + rm V.input + + mkfifo V.output + exec 9<>V.output + rm V.output + + ( + git cat-file --buffer --batch-command <&8 >&9 & + echo $! >&9 && + wait $! + echo >&9 EARLY_EXIT + ) & + sh_pid=$! + read fi_pid <&9 + test_when_finished " + exec 8>&-; exec 9>&-; + kill $sh_pid && wait $sh_pid + kill $fi_pid && wait $fi_pid + true" + expect=$(echo "$sha1 $type $size") + echo "info $sha1" >&8 + if test "$flush" = "true" + then + echo "fflush" >&8 + else + expect="EARLY_EXIT" + kill $fi_pid && wait $fi_pid + fi + read actual <&9 + test "$actual" = "$expect" +} + run_tests () { type=$1 sha1=$2 @@ -177,6 +217,18 @@ $content" test_cmp expect actual ' + test -z "$content" || + test_expect_success "--batch-command output of $type content is correct" ' + maybe_remove_timestamp "$batch_output" $no_ts >expect && + maybe_remove_timestamp "$(echo object $sha1 | git cat-file --batch-command)" $no_ts >actual && + test_cmp expect actual + ' + + test_expect_success "batch-command output of $type info is correct" ' + echo "$sha1 $type $size" >expect && + echo "info $sha1" | git cat-file --batch-command >actual && + test_cmp expect actual +' test_expect_success "custom --batch-check format" ' echo "$type $sha1" >expect && echo $sha1 | git cat-file --batch-check="%(objecttype) %(objectname)" >actual && @@ -232,12 +284,29 @@ test_expect_success '--batch-check without %(rest) considers whole line' ' test_cmp expect actual ' +test_expect_success '--batch-command --buffer with flush is correct for blob' ' + run_buffer_test 'blob' $hello_sha1 $hello_size true +' + +test_expect_success '--batch-command --buffer without flush is correct for blob' ' + run_buffer_test 'blob' $hello_sha1 $hello_size false +' + tree_sha1=$(git write-tree) + tree_size=$(($(test_oid rawsz) + 13)) tree_pretty_content="100644 blob $hello_sha1 hello" run_tests 'tree' $tree_sha1 $tree_size "" "$tree_pretty_content" +test_expect_success '--batch-command --buffer with flush is correct for tree' ' + run_buffer_test 'tree' $tree_sha1 $tree_size true +' + +test_expect_success '--batch-command --buffer without flush is correct for tree' ' + run_buffer_test 'tree' $tree_sha1 $tree_size false +' + commit_message="Initial commit" commit_sha1=$(echo_without_newline "$commit_message" | git commit-tree $tree_sha1) commit_size=$(($(test_oid hexsz) + 137)) @@ -263,6 +332,14 @@ tag_size=$(strlen "$tag_content") run_tests 'tag' $tag_sha1 $tag_size "$tag_content" "$tag_content" 1 +test_expect_success '--batch-command --buffer with flush is correct for tag' ' + run_buffer_test 'tag' $tag_sha1 $tag_size true +' + +test_expect_success '--batch-command --buffer without flush is correct for tag' ' + run_buffer_test 'tag' $tag_sha1 $tag_size false +' + test_expect_success \ "Reach a blob from a tag pointing to it" \ "test '$hello_content' = \"\$(git cat-file blob $tag_sha1)\"" @@ -964,4 +1041,10 @@ test_expect_success 'cat-file --batch-all-objects --batch-check ignores replace' test_cmp expect actual ' +test_expect_success 'batch-command unknown command' ' + echo unknown_command >cmd && + test_expect_code 128 git cat-file --batch-command < cmd 2>err && + grep -E "^fatal:.*unknown command.*" err +' + test_done
This RFC patch proposes a new flag --batch-command that works with git-cat-file --batch. Similar to git-update-ref --stdin, it will accept commands and arguments from stdin. The start of this idea was discussed in [1], where the original motivation was to be able to control when the buffer was flushed to stdout in --buffer mode. However, this can actually be much more useful in situations when git-cat-file --batch is being used as a long lived backend query process. At GitLab, we use a pair of cat-file processes. One for iterating over object metadata with --batch-check, and the other to grab object contents with --batch. However, if we had --batch-command, we could get rid of the second --batch-check process, and just have one process where we can flip between getting object info, and getting object contents. This can lead to huge savings since on a given server there could be hundreds to thousands of git cat-file processes at a time. git cat-file --batch-command $ <command> [arg1] [arg2] NL This patch adds three commands: object, info, fflush $ object <sha1> NL $ info <sha1> NL $ fflush NL These three would be immediately useful in GitLab's context, but one can imagine this mode to be further extended for other things. Future improvements: - a non-trivial part of "cat-file --batch" time is spent on parsing its argument and seeing if it's a revision, ref etc. So we could add a command that only accepts a full-length 40 character SHA-1. This would be the first step in adding such an interface to git-cat-file. [1] https://lore.kernel.org/git/pull.1124.git.git.1636149400.gitgitgadget@gmail.com/ Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: John Cai <johncai86@gmail.com> --- Taylor, I'd be interested in your thoughts on this proposal since you helped review the previous patch that turned into this RFC. Thanks! Changes from v2: - refactored tests to be within run_tests() - added a test to test --buffer behavior with fflush - cleaned up cat-file.c: clarified var names, removed unnecessary code based on suggestions from Phillip Wood - removed strvec changes Changes from v1: - changed option name to batch-command. - changed command function interface to receive the whole line after the command name to put the onus of parsing arguments to each individual command function. - pass in whole line to batch_one_object in both parse_cmd_object and parse_cmd_info to support spaces in the object reference. - removed addition of -z to include in a separate patch series - added documentation. --- Documentation/git-cat-file.txt | 15 +++++ builtin/cat-file.c | 114 +++++++++++++++++++++++++++++++-- t/t1006-cat-file.sh | 83 ++++++++++++++++++++++++ 3 files changed, 205 insertions(+), 7 deletions(-)