Message ID | 8d1e1eaa70b96779416f2f48a862d31a730c4521.1718130288.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mktree: support more flexible usage | expand |
"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Victoria Dye <vdye@github.com> > > Replace the custom input parsing of 'mktree' with 'read_index_info()', which > handles not only the 'ls-tree' output format it already handles but also the > other formats compatible with 'update-index'. Yay. > This lends some consistency > across the commands (avoiding the need for two similar implementations for > input parsing) and adds flexibility to mktree. > > Update 'Documentation/git-mktree.txt' to reflect the more permissive input > format. Nice. > DESCRIPTION > ----------- > -Reads standard input in non-recursive `ls-tree` output format, and creates > -a tree object. The order of the tree entries is normalized by mktree so > -pre-sorting the input is not required. The object name of the tree object > -built is written to the standard output. > +Reads entry information from stdin and creates a tree object from those entries. > +The object name of the tree object built is written to the standard output. pre-sorting is now required? Ah, such details are left to the section dedicated for the input format. Makes sense. The line is getting overly long (the first line now is exactly 80-columns); wrapping them to leave a bit of room to grow, like at around 72-76 columns, would be appreciated. > +INPUT FORMAT > +------------ > +Tree entries may be specified in any of the formats compatible with the > +`--index-info` option to linkgit:git-update-index[1]. The order of the tree > +entries is normalized by `mktree` so pre-sorting the input by path is not > +required. OK. We might want to split the description of the three-formats into a separate file and include it in here and in the original (I'd certainly insist doing so if we had three places that want to refer to it), but we have only two so let's just remember to do so when we may want to add the third place in the future. > diff --git a/builtin/mktree.c b/builtin/mktree.c > index 15bd908702a..5530257252d 100644 > --- a/builtin/mktree.c > +++ b/builtin/mktree.c > @@ -6,6 +6,7 @@ > #include "builtin.h" > #include "gettext.h" > #include "hex.h" > +#include "index-info.h" > #include "quote.h" > #include "strbuf.h" > #include "tree.h" > @@ -93,123 +94,80 @@ static const char *mktree_usage[] = { > NULL > }; > > -static void mktree_line(char *buf, int nul_term_line, int allow_missing, > - struct tree_entry_array *arr) > +struct mktree_line_data { > + struct tree_entry_array *arr; > + int allow_missing; > +}; > + > +static int mktree_line(unsigned int mode, struct object_id *oid, > + enum object_type obj_type, int stage UNUSED, > + const char *path, void *cbdata) > { > + struct mktree_line_data *data = cbdata; > + enum object_type mode_type = object_type(mode); > struct object_info oi = OBJECT_INFO_INIT; > + enum object_type parsed_obj_type; > > + if (obj_type && mode_type != obj_type) > + die("object type (%s) doesn't match mode type (%s)", > + type_name(obj_type), type_name(mode_type)); > > + oi.typep = &parsed_obj_type; > > + if (oid_object_info_extended(the_repository, oid, &oi, > OBJECT_INFO_LOOKUP_REPLACE | > OBJECT_INFO_QUICK | > OBJECT_INFO_SKIP_FETCH_OBJECT) < 0) > + parsed_obj_type = -1; > > + if (parsed_obj_type < 0) { > + if (data->allow_missing || S_ISGITLINK(mode)) { > + ; /* no problem - missing objects & submodules are presumed to be of the right type */ Overlong line? > } else { > + die("entry '%s' object %s is unavailable", path, oid_to_hex(oid)); > } Each side of if/else has only a single statement block that does not want {braces} around it. I wonder if flipping the polarity makes it easier to follow the logic flow: if (!data->allow_missing && !S_ISGITLINK(mode)) die("..."); I wonder if we even want to do the oid_object_info_extended() when we are expecting to see a gitlink. We do not expect to have the commit in our history (as it is part of the history of a submodule, which is from a separate project), so even if we found such an object in our object database, we do not want to do anything with the information we learn about the object. So I am wondering if the whole cascade should read more like if (S_ISGITILNK(mode)) { ... anything goes ... } else if (oid_object_info_extended(...) < 0 && !data->allow_missing) { ... not found ... } else if (parsed_obj_type != mode_type) { ... found something different from what we expected ... } The main loop, thanks to read_index_info() refactoring, got really easier to read, i.e. compact and clear.
On Tue, Jun 11, 2024 at 06:24:39PM +0000, Victoria Dye via GitGitGadget wrote: > diff --git a/Documentation/git-mktree.txt b/Documentation/git-mktree.txt > index 383f09dd333..507682ed23e 100644 > --- a/Documentation/git-mktree.txt > +++ b/Documentation/git-mktree.txt > @@ -13,15 +13,13 @@ SYNOPSIS > > DESCRIPTION > ----------- > -Reads standard input in non-recursive `ls-tree` output format, and creates > -a tree object. The order of the tree entries is normalized by mktree so > -pre-sorting the input is not required. The object name of the tree object > -built is written to the standard output. > +Reads entry information from stdin and creates a tree object from those entries. > +The object name of the tree object built is written to the standard output. It makes perfect sense to not single out git-ls-tree(1) anymore. But I think we should help the reader a bit by continuing to point out which commands can be used as input here. That can be either here in the description, further down in the new "INPUT FORMAT" section, or in both places. Patrick
Patrick Steinhardt <ps@pks.im> writes: > It makes perfect sense to not single out git-ls-tree(1) anymore. But I > think we should help the reader a bit by continuing to point out which > commands can be used as input here. That can be either here in the > description, further down in the new "INPUT FORMAT" section, or in both > places. Here is a way to do so, which I alluded to earlier. The original text is too specific to "update-index" in that it talked about "stuffing them into the index", which does not apply in the context of "mktree". And then it made me realize that "ls-files -s" output has the stage information, which of course is needed for "update-index" to be able to recreate the index state from a textual dump, but "mktree" should reject if given a higher stage entry. It seems that the code after applying all these 16 patches does not diagnose it as an error if you feed a non-zero stage. The callback starts like so. static int mktree_line(unsigned int mode, struct object_id *oid, enum object_type obj_type, int stage UNUSED, const char *path, void *cbdata) { I _think_ it should be made an error if the input has non-zero stage, which would be a sign that it was taken from "ls-files -s" (or even "ls-files -u"), out of which "git write-tree" will REFUSE to create a tree object. "mktree" should behave the same way, no? In any case, here is the documentation split/refactor. Documentation/git-mktree.txt | 4 +++- Documentation/git-update-index.txt | 14 +------------- Documentation/index-info-formats.txt | 13 +++++++++++++ 3 files changed, 17 insertions(+), 14 deletions(-) diff --git c/Documentation/git-mktree.txt w/Documentation/git-mktree.txt index a660438c67..fefaa83d29 100644 --- c/Documentation/git-mktree.txt +++ w/Documentation/git-mktree.txt @@ -48,7 +48,9 @@ OPTIONS INPUT FORMAT ------------ Tree entries may be specified in any of the formats compatible with the -`--index-info` option to linkgit:git-update-index[1]. +`--index-info` option to linkgit:git-update-index[1]. That is: + +include::index-info-formats.txt[] Entries may use full pathnames containing directory separators to specify entries nested within one or more directories. These entries are inserted into diff --git c/Documentation/git-update-index.txt w/Documentation/git-update-index.txt index 7128aed540..2287a5d4be 100644 --- c/Documentation/git-update-index.txt +++ w/Documentation/git-update-index.txt @@ -280,19 +280,7 @@ USING --INDEX-INFO multiple entry definitions from the standard input, and designed specifically for scripts. It can take inputs of three formats: - . mode SP type SP sha1 TAB path -+ -This format is to stuff `git ls-tree` output into the index. - - . mode SP sha1 SP stage TAB path -+ -This format is to put higher order stages into the -index file and matches 'git ls-files --stage' output. - - . mode SP sha1 TAB path -+ -This format is no longer produced by any Git command, but is -and will continue to be supported by `update-index --index-info`. +include::index-info-formats.txt[] To place a higher stage entry to the index, the path should first be removed by feeding a mode=0 entry for the path, and diff --git c/Documentation/index-info-formats.txt w/Documentation/index-info-formats.txt new file mode 100644 index 0000000000..037ebd2432 --- /dev/null +++ w/Documentation/index-info-formats.txt @@ -0,0 +1,13 @@ + . mode SP type SP sha1 TAB path ++ +This format is to use `git ls-tree` output. + + . mode SP sha1 SP stage TAB path ++ +This format allows higher order stages to appear and +matches 'git ls-files --stage' output. + + . mode SP sha1 TAB path ++ +This format is no longer produced by any Git command, but is +and will continue to be supported.
diff --git a/Documentation/git-mktree.txt b/Documentation/git-mktree.txt index 383f09dd333..507682ed23e 100644 --- a/Documentation/git-mktree.txt +++ b/Documentation/git-mktree.txt @@ -3,7 +3,7 @@ git-mktree(1) NAME ---- -git-mktree - Build a tree-object from ls-tree formatted text +git-mktree - Build a tree-object from formatted tree entries SYNOPSIS @@ -13,15 +13,13 @@ SYNOPSIS DESCRIPTION ----------- -Reads standard input in non-recursive `ls-tree` output format, and creates -a tree object. The order of the tree entries is normalized by mktree so -pre-sorting the input is not required. The object name of the tree object -built is written to the standard output. +Reads entry information from stdin and creates a tree object from those entries. +The object name of the tree object built is written to the standard output. OPTIONS ------- -z:: - Read the NUL-terminated `ls-tree -z` output instead. + Input lines are separated with NUL rather than LF. --missing:: Allow missing objects. The default behaviour (without this option) @@ -35,6 +33,13 @@ OPTIONS optional. Note - if the `-z` option is used, lines are terminated with NUL. +INPUT FORMAT +------------ +Tree entries may be specified in any of the formats compatible with the +`--index-info` option to linkgit:git-update-index[1]. The order of the tree +entries is normalized by `mktree` so pre-sorting the input by path is not +required. + GIT --- Part of the linkgit:git[1] suite diff --git a/builtin/mktree.c b/builtin/mktree.c index 15bd908702a..5530257252d 100644 --- a/builtin/mktree.c +++ b/builtin/mktree.c @@ -6,6 +6,7 @@ #include "builtin.h" #include "gettext.h" #include "hex.h" +#include "index-info.h" #include "quote.h" #include "strbuf.h" #include "tree.h" @@ -93,123 +94,80 @@ static const char *mktree_usage[] = { NULL }; -static void mktree_line(char *buf, int nul_term_line, int allow_missing, - struct tree_entry_array *arr) +struct mktree_line_data { + struct tree_entry_array *arr; + int allow_missing; +}; + +static int mktree_line(unsigned int mode, struct object_id *oid, + enum object_type obj_type, int stage UNUSED, + const char *path, void *cbdata) { - char *ptr, *ntr; - const char *p; - unsigned mode; - enum object_type mode_type; /* object type derived from mode */ - enum object_type obj_type; /* object type derived from sha */ + struct mktree_line_data *data = cbdata; + enum object_type mode_type = object_type(mode); struct object_info oi = OBJECT_INFO_INIT; - char *path, *to_free = NULL; - struct object_id oid; + enum object_type parsed_obj_type; - ptr = buf; - /* - * Read non-recursive ls-tree output format: - * mode SP type SP sha1 TAB name - */ - mode = strtoul(ptr, &ntr, 8); - if (ptr == ntr || !ntr || *ntr != ' ') - die("input format error: %s", buf); - ptr = ntr + 1; /* type */ - ntr = strchr(ptr, ' '); - if (!ntr || parse_oid_hex(ntr + 1, &oid, &p) || - *p != '\t') - die("input format error: %s", buf); - - /* It is perfectly normal if we do not have a commit from a submodule */ - if (S_ISGITLINK(mode)) - allow_missing = 1; - - - *ntr++ = 0; /* now at the beginning of SHA1 */ - - path = (char *)p + 1; /* at the beginning of name */ - if (!nul_term_line && path[0] == '"') { - struct strbuf p_uq = STRBUF_INIT; - if (unquote_c_style(&p_uq, path, NULL)) - die("invalid quoting"); - path = to_free = strbuf_detach(&p_uq, NULL); - } + if (obj_type && mode_type != obj_type) + die("object type (%s) doesn't match mode type (%s)", + type_name(obj_type), type_name(mode_type)); - /* - * Object type is redundantly derivable three ways. - * These should all agree. - */ - mode_type = object_type(mode); - if (mode_type != type_from_string(ptr)) { - die("entry '%s' object type (%s) doesn't match mode type (%s)", - path, ptr, type_name(mode_type)); - } + oi.typep = &parsed_obj_type; - /* Check the type of object identified by oid without fetching objects */ - oi.typep = &obj_type; - if (oid_object_info_extended(the_repository, &oid, &oi, + if (oid_object_info_extended(the_repository, oid, &oi, OBJECT_INFO_LOOKUP_REPLACE | OBJECT_INFO_QUICK | OBJECT_INFO_SKIP_FETCH_OBJECT) < 0) - obj_type = -1; + parsed_obj_type = -1; - if (obj_type < 0) { - if (allow_missing) { - ; /* no problem - missing objects are presumed to be of the right type */ + if (parsed_obj_type < 0) { + if (data->allow_missing || S_ISGITLINK(mode)) { + ; /* no problem - missing objects & submodules are presumed to be of the right type */ } else { - die("entry '%s' object %s is unavailable", path, oid_to_hex(&oid)); - } - } else { - if (obj_type != mode_type) { - /* - * The object exists but is of the wrong type. - * This is a problem regardless of allow_missing - * because the new tree entry will never be correct. - */ - die("entry '%s' object %s is a %s but specified type was (%s)", - path, oid_to_hex(&oid), type_name(obj_type), type_name(mode_type)); + die("entry '%s' object %s is unavailable", path, oid_to_hex(oid)); } + } else if (parsed_obj_type != mode_type) { + /* + * The object exists but is of the wrong type. + * This is a problem regardless of allow_missing + * because the new tree entry will never be correct. + */ + die("entry '%s' object %s is a %s but specified type was (%s)", + path, oid_to_hex(oid), type_name(parsed_obj_type), type_name(mode_type)); } - append_to_tree(mode, &oid, path, arr); - free(to_free); + append_to_tree(mode, oid, path, data->arr); + return 0; } int cmd_mktree(int ac, const char **av, const char *prefix) { - struct strbuf sb = STRBUF_INIT; struct object_id oid; int nul_term_line = 0; - int allow_missing = 0; int is_batch_mode = 0; - int got_eof = 0; struct tree_entry_array arr = { 0 }; - strbuf_getline_fn getline_fn; + struct mktree_line_data mktree_line_data = { .arr = &arr }; + int ret; const struct option option[] = { OPT_BOOL('z', NULL, &nul_term_line, N_("input is NUL terminated")), - OPT_BOOL(0, "missing", &allow_missing, N_("allow missing objects")), + OPT_BOOL(0, "missing", &mktree_line_data.allow_missing, N_("allow missing objects")), OPT_BOOL(0, "batch", &is_batch_mode, N_("allow creation of more than one tree")), OPT_END() }; ac = parse_options(ac, av, prefix, option, mktree_usage, 0); - getline_fn = nul_term_line ? strbuf_getline_nul : strbuf_getline_lf; - - while (!got_eof) { - while (1) { - if (getline_fn(&sb, stdin) == EOF) { - got_eof = 1; - break; - } - if (sb.buf[0] == '\0') { - /* empty lines denote tree boundaries in batch mode */ - if (is_batch_mode) - break; - die("input format error: (blank line only valid in batch mode)"); - } - mktree_line(sb.buf, nul_term_line, allow_missing, &arr); - } - if (is_batch_mode && got_eof && arr.nr < 1) { + + do { + ret = read_index_info(nul_term_line, mktree_line, &mktree_line_data); + if (ret < 0) + break; + + /* empty lines denote tree boundaries in batch mode */ + if (ret > 0 && !is_batch_mode) + die("input format error: (blank line only valid in batch mode)"); + + if (is_batch_mode && !ret && arr.nr < 1) { /* * Execution gets here if the last tree entry is terminated with a * new-line. The final new-line has been made optional to be @@ -222,9 +180,8 @@ int cmd_mktree(int ac, const char **av, const char *prefix) fflush(stdout); } clear_tree_entry_array(&arr); /* reset tree entry buffer for re-use in batch mode */ - } + } while (ret > 0); release_tree_entry_array(&arr); - strbuf_release(&sb); - return 0; + return !!ret; } diff --git a/t/t1010-mktree.sh b/t/t1010-mktree.sh index 22875ba598c..9b2ab0c97ad 100755 --- a/t/t1010-mktree.sh +++ b/t/t1010-mktree.sh @@ -54,11 +54,36 @@ test_expect_success 'ls-tree output in wrong order given to mktree (2)' ' test_cmp tree.withsub actual ' +test_expect_success '--batch creates multiple trees' ' + cat top >multi-tree && + echo "" >>multi-tree && + cat top.withsub >>multi-tree && + + cat tree >expect && + cat tree.withsub >>expect && + git mktree --batch <multi-tree >actual && + test_cmp expect actual +' + test_expect_success 'allow missing object with --missing' ' git mktree --missing <top.missing >actual && test_cmp tree.missing actual ' +test_expect_success 'mktree with invalid submodule OIDs' ' + # non-existent OID - ok + printf "160000 commit $(test_oid numeric)\tA\n" >in && + git mktree <in >tree.actual && + git ls-tree $(cat tree.actual) >actual && + test_cmp in actual && + + # existing OID, wrong type - error + tree_oid="$(cat tree)" && + printf "160000 commit $tree_oid\tA" | + test_must_fail git mktree 2>err && + grep "object $tree_oid is a tree but specified type was (commit)" err +' + test_expect_success 'mktree refuses to read ls-tree -r output (1)' ' test_must_fail git mktree <all ' @@ -67,4 +92,45 @@ test_expect_success 'mktree refuses to read ls-tree -r output (2)' ' test_must_fail git mktree <all.withsub ' +test_expect_success 'mktree fails on malformed input' ' + # empty line without --batch + echo "" | + test_must_fail git mktree 2>err && + grep "blank line only valid in batch mode" err && + + # bad whitespace + printf "100644 blob $EMPTY_BLOB A" | + test_must_fail git mktree 2>err && + grep "malformed input line" err && + + # invalid type + printf "100644 bad $EMPTY_BLOB\tA" | + test_must_fail git mktree 2>err && + grep "invalid object type" err && + + # invalid OID length + printf "100755 blob abc123\tA" | + test_must_fail git mktree 2>err && + grep "malformed input line" err && + + # bad quoting + printf "100644 blob $EMPTY_BLOB\t\"A" | + test_must_fail git mktree 2>err && + grep "bad quoting of path name" err +' + +test_expect_success 'mktree fails on mode mismatch' ' + tree_oid="$(cat tree)" && + + # mode-type mismatch + printf "100644 tree $tree_oid\tA" | + test_must_fail git mktree 2>err && + grep "object type (tree) doesn${SQ}t match mode type (blob)" err && + + # mode-object mismatch (no --missing) + printf "100644 $tree_oid\tA" | + test_must_fail git mktree 2>err && + grep "object $tree_oid is a tree but specified type was (blob)" err +' + test_done